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RDMAK DNA UISKATCH REPAIR PROTEINS 



This invention relates to newly identified 
polynucleotides, polypeptides encoded by such 
polynucleotides, the use of such polynucleotides and 
polypeptides, as well as the production of such 
polynucleotides and polypeptides. More particularly, the 
polypeptides of the present invention are human horaologs of 
the prokairyotic mutL4 gene and are hereinafter referred to as 
hMLHl, hMLH2 and hMLH3. 

In both prolaryotes and eukaryotes, the DNA mismatch 
repair gene plays a prominent role in the correction of 
errors made during DNA replication and genetic recombination. 
The E.coli methyl -directed DNA mismatch repair system is the 
best understood DNA mismatch repair system to date. In 
E.coll, this repair pathway involves the products of the 
mutator genes mutS, mutL, mutH, and uvrD. Mutants of any one 
of these genes will reveal a mutator phenotype. Afut5 is a 
DNA mismatch-binding protein which initiates this repair 
process, uvrD is a DNA helicase and MutH is a latent 



wo 95/20678 



PCTAUS95/01035 



endonuclease that incises at the unmethylated strands of a 
hemi -methylated GATC sequence. MutL protein is believed to 
recognize and bind to the mismatch-DNA-MutS-MutH complex to 
enhance the endonuclease activity of MutH protein. After the 
unmethylated DNA strand is cut by the MutH, single-stranded 
DNA-binding protein, DNA polymerase III, exonuclease I and < 
DNA ligase are required to coii?)lete this repair process 
(Modrich P., Annu. Rev. Genetics, 25:229-53 (1991)). ^ 

Elements of the E.coli MutlMS system appears to be 
conserved during evolution in prokaryotes and eukaryotes. 
Genetic study analysis suggests that Saccharomyces cerevxsiae 
has a mismatch repair system similar to the bacterial MutLHS 
system. In S. cerevxsiae, at least two MutL homologs, PMSl 
and MLHl, have been reported. Mutation of either one of them 
leads to a mitotic mutator phenotype (Prolla et al, Mol. 
Cell. Biol. 14:407-415 (1994)). At least three MutS homologs 
have been found in S.cerevisiae, namely MSHl, MSH2, and MSH3, 
Disruption of the MSH2 gene affects nuclear mutation rates. 
Mutants in S. cerevJLsae, MSH2, PMSl, and MLHl have been found 
to exhibit increased rates of expansion and contraction of 
dinucleotide repeat sequences (Strand et al.. Nature, 
365:274-276 (1993) ) , 

It has been reported that a number of human tumors such 
as lung cancer, prostate cancer, ovarian cancer, breast 
cancer, colon cancer and stomach cancer show instability of 
repeated DNA sequences (Han et al., Camcer, 53:5087-5089 
(1993); Thibodeau et al . , Science 260:816-819 (1993); 
Risinger et al.. Cancer 53:5100-5103 (1993)). This 
phenomenon suggests that lack of the DNA mismatch repair is 
probably the cause of these tumors. 

Little was known about the DNA mismatch repair system in 
humans until recently, the human homolog of the MutS gene was 
cloned and found to be responsible for hereditary 
nonpplyposis colon cancer (HNPCC) , (Fishel et al.. Cell, 
75:1027-1038 (1993) and Leach et al., Cell, 75:1215-1225 
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(1993)). HNPCC was first linked to a locus at chromosome 
2pi6 which causes dinucleotide instability. It was then 
demonstrated that a DNA mismatch repair protein (MutS) 
homolog was located at this locus, and that C-->T 
transitional mutations at several conserved regions were 
specifically observed in HNPCC patients. Hereditary 
nonpolyposis colorectal cancer is one of the most common 
heredi table diseases of man. affecting as many as one in two 
hundred individuals in the western world. 

It has been demonstrated that hereditary colon cancer 
can result from mutations in several loci. Familial 
adenomatosis polyposis coli (APC) . linked to a gene on 
chromosome 5, is responsible for a small minority of 
hereditary colon cancer. Hereditary colon cancer is also 
associated with Gardner's syndrome, Turcot 's syndrome, Peutz- 
jaeghers syndrome and juvenile polyposis coli. In addition, 
hereditary nonpolyposis colon cancer may be involved in 5% of 
all human colon cancer. All of the different types of 
familial colon cancer have been shown to be transmitted by a 
dominant autosomal mode of inheritance. 

In addition to localization of HNPCC, to the short arm 
of chromosome 2, a second locus has been linked to a pre- 
disposition to HNPCC (Lindholm, et al., Nature Genetics, 
5:279-282 (1993)) . A Strong linkage was demonstrated between 
a polymorphic marker on the short arm of chromosome 3 and the 
disease locus. 

This finding suggests that mutations on various DNA 
mismatch repair proteins probably play crucial roles in the 
development of human hereditary diseases and cancers. 

HNPCC is characterized clinically by an apparent 
autosomal dominantly inherited predisposition to cancer of 
the colon, endometrium and other organs. (Lynch, H.T. et 
al., ftastroenteroloqy . 104:1535-1549 (1993)). The 
identification of markers at 2pl6 and 3p2l-22 which were 
linked to disease in selected HNPCC kindred unequivocally 
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established its mendelian nature (Peltotnaki, P. et al.. 
Science . 260:810-812 (1993)). Tumors from HNPCC patients are 
characterized by widespread alterations of simple repeated 
sequences (microaatellites) (Aaltonen, et al., Science, 

260:812-816 (1993)). This type of genetic instability was 
originally observed in a subset (12 to 18% of sporadic « 
colorectal cancers ( Id. ) . Studies in bacteria and yeast 
indicated that a defect in DNA mismatch repair genes can ^ 
result in a similar instability of microsatellites (Levinson, 
G. and Gutman, G.A., Nuc . Acids Res . , 15:5325-5338 (1987)), 
and it was hypothesized that deficiency in mismatched repair 
was responsible for HNPCC (Strand, M. et al., Nature, 
365:274-276 (1993)). Analysis of extracts from HNPCC tumor 
cell lines showed mismatch repair was indeed deficient, 
adding definitive support to this conjecture (Parsons, R.P., 
et al.. Cell, 75:1227-1236 (1993)) . As not all HNPCC kindred 
can be linked to the same loci, and as at least three genes 
can produce a similar phenotype in yeast, it seems likely 
that other mismatch repair genes could play a role in some 
cases of HNPCC. 

hMLHl is most homologous to the yeast mutL-homolog yMLHl 
while hMLH2 smd hMLH3 have greater homology to the yeast 
mutL-homolog yPMSl (hMLH2 and hMLH3 due to their homology to 
yeast PMSl gene are sometimes referred to in the literature 
as hPMSl and hPMS2) . in addition to hMLHl, both the hMLH2 
gene on chromosome 2q32 and the hMLH3 gene, on chromosome 
7p22, were found to be mutated in the germ line of HNPCC 
patients. This doubles the number of genes implicated in 
HNPCC and may help explain the relatively high incidence of 
this disease. 

In accordance with one aspect of the present invention, 
there are provided novel putative mature polypeptides which 
are hMLHl, hMLH2 and hMLH3 , as well as biologically active 
and diagnostically or therapeutically useful fragments, , 
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axialogs and derivatives thereof. The polypeptides of the 
present invention are of hiiman origin. 

In accordance with another aspect of the present 
invention, there are provided isolated nucleic acid molecules 
encoding such polypeptides, including mRNAs, DNAs, cDNAs, 
genomic DNA as well as biologically active and diagnostically 
or therapeutically useful fragments, analogs and derivatives 
thereof . 

In accordance with still another aspect of the present 
invention there are provided nucleic acid probes comprising 
nucleic acid molecules of sufficient length to specifically 
hybridize to hMLHl, hMLH2 and hMLH3 sequences. 

In accordance with yet a further aspect of the present 
invention, there is provided a process for producing such 
polypeptides by recombinant techniques which comprises 
culturing recombinant prokaryotic and/ or eukaryotic host 
cells, containing an hMLHl, hMLH2 or hMLHB nucleic acid 
sequence, under conditions promoting expression of said 
protein and subsequent recovery of said proteins. 

In accordance with yet a further aspect of the present 
invention, there is provided a process for utilizing such 
polypeptide, or polynucleotide encoding such polypeptide, for 
therapeutic purposes, for exanple, for the treatment of 
cancers . 

In accordance with another aspect of the present 
invention there is provided a method of diagnosing a disease 
or a susceptibility to a disease related to a mutation in the 
hMLHl, hMI«H2 or hMLHB nucleic acid sequences and the proteins 
encoded by such nucleic acid secpuences . 

In accordance with yet a further aspect of the present 
invention, there is provided a process for utilizing such 
polypeptides, or polynucleotides encoding such polypeptides, 
for In vitro purposes related to scientific research, 
synthesis of DNA and manufacture of DNA vectors. 
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These and other aspects of the present invention should 
be apparent to those skilled in the art from the teachings 
herein . 

The following drawings are illustrative of embodiments 
of the invention cuid are not meant to limit the scope of the 
invention as encompassed by the claims. 

Figure l illustrates the cDNA sequence and corresponding 
deduced amino acid sequence for the human DNA repair protein 
hMIiHl. The amino acids are represented by their stcuxdard 
one-letter abbreviations. Sequencing was performed using a 
373 Automated DNA sequencer (Applied Biosystems, Inc.). 
Sequencing accuracy is predicted to be greater than 97% 
accurate . 

Figure 2 illustrates the cDNA sequence and corresponding 
deduced amino acid sequence of hMliH2 . The amino acids are 
represented by their standard one-letter abbreviations. 

Figure 3 illustrates the cDNA sequence and corresponding 
deduced amino acid sequence of hMLH3. The amino acids are 
represented by their standard one-letter abbreviations. 

Figure 4 . Alignment of the predicted amino acid 
sequences of S. cerevisiae PMSi (yPMSl) , with the hMLH2 and 
hMIjH3 amino acid sequences using MACAW {version 1.0) program. 
Amino acid in conserved blocks are capitalized and shaded on 
the mean of their pair-wise scores. 

Figure 5. Mutational analysis of hMIjH2. (A) IVSP 
analysis and mapping of the transcriptional stop mutation in 
HNPCC patient CW. Translation of codons l to 369 (lane i) , 
codons 1 to 290 (lane 2), and codons l to 214 (lane 3). CW 
is translated from the cDNA of patient CW, while NOR was 
translated from the cdna of a normal individual. The 
arrowheads indicate the truncated polypeptide due to the 
potential stop mutation. The arrows indicate molecular 
weight markers in kilodaltons. (B) Sequence analysis of CW 
indicates a C to T transition at codon 233 (indicated by the 
arrow) . IieUies l and 3 are sec[uence derived from control 
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patients; lane 2 is sequence derived from genomic dna of cw. 
The ddA mixes from each sequencing mix were loaded in 
adjacent lanes to facilitate comparison as were those for 
ddC, ddD, and ddT mixes. 

Figure 6. Mutational analysis of hMLH3. (A) rvsP 
analysis of hMLHS from patient GC. Lane GC is from 
fibroblasts of individual GC; lane GCx is from the tumor of 
patient GC; lanes NORl and 2 are from normal control 
individuals. FL indicates full-length protein, and the 
arrowheads indicate the germ line truncated polypeptide . The 
arrows indicate molecular weight markers in kilodaltons (B) 
PGR analysis of DNA from a patient GC shows that the lesion 
in present in both hMLH3 alleles in tumor cells. 
An5)llfication was done using primers that an?)lify 5', 3' , or 
within (MID) the region deleted in the cDNA. Lane 1, DNA 
derived from fibroblasts of patient GC; lane 2, DNA derived 
from tumor of patient GC; lane 3. DNA derived from a normal 
control patient; lane 4. reactions without DNA template. 
Arrows indicate molecular weight in base pairs. 

in accordance with an aspect of the present invention, 
there are provided isolated nucleic acids (polynucleotides) 
which encode for the mature polypeptides having the deduced 
amino acid sequence of Figures l, 2 and 3 (SEQ ID No. 2. 4 
and 6) or for the mature polypeptides encoded by the cDNA of 
the clone deposited as ATCC Deposit No. 75649, 75651, 75650, 
deposited on January 25, 1994. 

ATCC Deposit No. 75649 is a cDNA clone which contains 
the full length sequence encoding the human DNA repair 
protein referred to herein as hMLHl; ATCC Deposit No. 75651 
is a CDNA clone containing the full length cDNA sequence 
encoding the human DNA repair protein referred to herein as 
hMLH2; ATCC Deposit NO. 75650 is a cDNA clone containing the 
full length DNA sequence referred to herein as hMLH3 . 

Polynucleotides encoding the polypeptides of the present 
invention may be obtained from one or more libraries prepared 
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from heart, lung, prostate, spleen, liver, gallbladder, fetal 
brain and testes tissues. The polynucleotides of hMLHl were 
discovered from a human gallbladder cDNA library. In 
addition, six cDNA clones which are identical to the hMLHl at 
the N- terminal ends were obtained from human cerebellum, 
eight -week embryo, fetal heart, HSC172 cells and Jurket cell 
cDNA libraries. The hMLHl gene contains an open reading 
frame of 756 amino acids encoding for an 85kD protein which 
exhibits homology to the bacterial and yeast mutL proteins. 
However, the 5' non- translated region was obtained from the 
CDNA clone obtained from the fetal heart for the purpose of 
extending the non-translated region to design the 
oligonucleotides . 

The hMLH2 gene was derived from a human T-cell lymphoma 
cDNA library. The hMLH2 cDNA clone identified an open 
reading frame of 2,796 base pairs flanked on both sides by 
in-frame termination codons. It is structurally related to 
the yeast PMSl family. It contains an open reading frame 
encoding a protein of 934 amino acid residues. The protein 
exhibits the highest degree of homology to yeast PMSl with 
27% identity and 82 % similarity over the entire protein. 

A second region of significant homology among the three 
PMS related proteins is in the carboxyl terminus, between 
codons 800 to 900. This region shares a 22% and 47% homology 
between yeast PMSl protein and hMLH2 and hMLHS proteins, 
respectively, while very little homology of this region was 
observed between these proteins, and the other yeast xnutL 
homo 1 og , yMLHl . 

The hMLH3 gene was derived from a human endometrial 
tumor CDNA library. The hMLH3 clone identified a 2,586 base 
pair open reading frame. It is structurally related to the 
yPMS2 protein family. It contains an open reading frame 
encoding a protein of 862 amino acid residues. The protein 
exhibits the highest degree of homology to yPMS2 with 32% 
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identity and 66% similarity over the entire amino acid 
sequence . 

It is significant with respect to a putative 
identification of hMLHl, hMLH2 and hMLH3 that the GFRGEAL 
domain which is conserved in mutL homologs derived from E. 
coli is conserved in the amino acid sequences of , hMiiHi, 
hMLH2 and hMLHS . 

The polynucleotides of the present invention may be in 
the form of SNA or in the form of DMA, which DKA includes 
cDNA, genomic DMA, and synthetic UNA. The DNA may be double- 
stranded or single-stranded, and if single stranded may be 
the coding strand or non-coding (anti-sense) strand. The 
coding sequence which encodes the mature polypeptide may be 
identical to the coding sequence shown in Figures 1. 2 and 3 
(SBQ ID No. 1) or that of the deposited clone or may be a 
different coding sequence which coding sequence, as a result 
of the redundancy or degeneracy of the genetic code, encodes 
the same mature polypeptides as the DNA of Figures l, 2 and 
3 (SEQ ID No. 2, 4 and 6) or the deposited cDNA(s) . 

The polynucleotides which encode for the mature 
polypeptides of Figures 1, 2 and 3 (SEQ ID No. 2, 4 and 6) or 
for the mature polypeptides encoded by the deposited cDNAs 
may include: only the coding sequence for the mature 
polypeptide; the coding sequence for the mature polypeptide 
(and optionally additional coding sequence) and non-coding 
sequence, such as introns or non-coding sequence 5' and/or 3' 
of the coding sequence for the mature polypeptide. 

Thus, the term "polynucleotide encoding a polypeptide" 
encompasses a polynucleotide which includes only coding 
sequence for the polypeptide as well as a polynucleotide 
which includes additional coding and/or non-coding sequence. 

The present invention further relates to variants of the 
hereinabove described polynucleotides which encode for 
fragments, analogs and derivatives of the polypeptides having 
the deduced amino acid sequences of Figures i, 2 and 3 (SBQ 
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ID NO. 2, 4 and 6) or the polypeptides encoded by the cDNA of 
the deposited clones. The variants of the polynucleotides 
may be a naturally occurring allelic variant of the 
polynucleotides or a non-naturally occurring variant of the 
polynucleotides . 

Thus, the present invention includes polynucleotides 
encoding the same mature polypeptides as shown in Figures X, 
2 and 3 ISEQ ID No. 2, 4 and €) or the same mature 
polypeptides encoded by the cDNA of the deposited clones as 
well as variants of such polynucleotides which variants 
encode for a fragment, derivative or analog of the 
polypeptides of Figures l, 2 and 3 (SEQ ID No. 2, 4 and 6) or 
the polypeptides encoded by the cDNA of the deposited clones. 
Such nucleotide variants include deletion variants, 
substitution variants and addition or insertion variants. 

AS hereinabove indicated, the polynucleotides may have 
a coding sequence which is a naturally occurring allelic 
variant of the coding sequence shown in Figures 1, 2 and 3 
(SEQ ID No. 1, 3 and 5) or of the coding sequence of the 
deposited clones. As known in the art, an allelic variant is 
an alternate form of a polynucleotide sequence which may have 
a substitution, deletion or addition of one or more 
nucleotides, which does not substantially alter the function 
of the encoded polypeptide. 

The polynucleotides of the present invention may also 
have the coding sequence fused in frame to a marker sequence 
which allows for purification of the polypeptides of the 
present invention. The marker sequence may be, for example, 
a hexa-histidine tag supplied by a pQE-9 vector to provide 
for purification of the mature polypeptides fused to the 
marker in the case of a bacterial host, or, for example, the 
marker sequence may be a hemagglutinin (HA) tag when a 
mammalian host, e.g. COS -7 cells, is used. The HA tag 
corresponds to an epitope derived from the influenza 
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hemagglutinin protein (Wilson, I., et al,, Cell, 27:161 
(1984) ) . 

The present invention further relates to 
polynucleotides which hybridize to the hereinabove -described 
sequences if there is at least 50% and preferably 70% 
identity between the sequences. The present invention 
particularly relates to polynucleotides which hybridize under 
stringent conditions to the hereinabove -described 
polynucleotides. As herein used, the term "stringent 
conditions" means hybridization will occur only if there is 
at least 95% and preferably at least 97% identity between the 
sequences. The polynucleotides which hybridize to the 
hereinabove described polynucleotides in a preferred 
embodiment encode polypeptides which retain substantially the 
same biological function or activity as the mature 
polypeptides encoded by the cDNA of Figures 1, 2 and 3 (SEQ 
ID No. 1, 3 and 5) or the deposited cDNA(s) . 

The deposit (s) referred to herein will be maintained 
under the terms of the Budapest Treaty on the International 
Recognition of the Deposit of Micro-organisms for purposes of 
Patent Procedure. These deposits are provided merely as 
convenience to those of skill in the art ^ and are not an 
admission that a deposit is required imder 35 U.S.C. §112. 
The sequence of the polynucleotides contained in the 
deposited materials, as well as the amino acid sequence of 
the polypeptides encoded thereby, are incorporated herein by 
reference cuid are controlling in the event of any conflict 
with any description of sequences herein. A license may be 
required to make, use or sell the deposited materials, and 
no such license is hereby granted. 

The present invention further relates to polypeptides 
which have the deduced amino acid sequence of Figures l, 2 
and 3 (SEQ ID No. 2, 4 and 6) or which have the amino acid 
sequence encoded by the deposited cDNA(s) , as well as 
fragments, analogs and derivatives of such polypeptides. 
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The terms "fragment," "derivative" and "analog" when 
referring to the polypeptides of Figures l. 2 and 3 (SEQ ID 
NO. 2, 4 and 6) or that encoded by the deposited cE«NA(s), 
means polypeptides which retain essentially the same 
biological function or activity as such polypeptides. Thus, 
an analog includes a proprotein which can be activated by 
cleavage of the proprotein portion to produce an active 
mature polypeptide. 

The polypeptides of the present invention may be a 
recombinant polypeptide, a natural polypeptide or a synthetic 
polypeptide, preferably a recombinant polypeptide. 

The fragment, derivative or analog of the polypeptides 
of Figures l, 2 and 3 (SEQ ID No. 2, 4 and 6) or that encoded 
by the deposited cDNAs may be (i) one in which one or more of 
the amino acid residues are substituted with a conserved or 
non-conserved amino acid residue (preferably a conserved 
amino acid residue) and such substituted amino acid residue 
may or may not be one encoded by the genetic code, or (ii) 
one in which one or more of the amino acid residues includes 
a substituent group, or (iii) one in which the mature 
polypeptide is fused with another cont?)Ound, such as a 
compound to increase the half -life of the polypeptide (for 
example, polyethylene glycol) . Such fragments, derivatives 
and analogs are deemed to be within the scope of those 
skilled in the art from the teachings herein. 

The polypeptides and polynucleotides of the present 
invention are preferably provided in an isolated form, and 
preferably are purified to homogeneity. 

The term "isolated" means that the material is removed 
from its original environment (e.g.. the natural environment 
if it is naturally occurring) . For example, a naturally- 
occurring polynucleotide or polypeptide present in a living 
animal is not isolated, but the same polynucleotide or 
polypeptide, separated from some or all of the co-existing 
materials in the nattiral system, is isolated. Such 
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polynucleotides could be part of a vector and/or such 
polynucleotides or polypeptides could be part of a 
composition, and still be isolated in that such vector or 
conposition is not part of its natural environment. 

The present invention also relates to vectors which 
include polynucleotides of the present invention, host cells 
which are genetically engineered with vectors of the 
invention and the production of polypeptides of the invention 
by recombinant technic[ues- 

Host cells are genetically engineered (transduced or 
transformed or transfected) with the vectors of this 
invention which may be, for example, a cloning vector or an 
expression vector. The vector may be, for example, in the 
form of a plasmid, a viral particle, a phage, etc. The 
engineered host cells can be cultured in conventional 
nutrient media modified as appropriate for activating 
promoters, selecting transf ormants or amplifying the hMLHl, 
hMLH2 and hMLH3 genes. The culture conditions, such as 
teitperature, pH and the like, are those previously used with 
the host cell selected for expression, and will be apparent 
to the ordinarily skilled artisan. 

The polynucleotides of the present invention may be 
enployed for producing polypeptides by recombinant 
techniques. Thus, for example, the polynucleotide may be 
included in any one of a variety of expression vectors for 
expressing a polypeptide. Such vectors include chromosomal, 
nonchromosomal and synthetic DNA . sequences, e.g., 
derivatives of SV40; bacterial plasmids; phage DNA; 
baculovirus; yeast plasmids; vectors derived from 
combinations of plasmids and phage DNA, viral DNA such as 
vaccinia, adenovirus, fowl pox virus, and pseudorabies . 
However, any other vector may be used as long as it is 
replicable and viable in the host . 

The appropriate DNA sequence may be inserted into the 
vector by a variety of procedures. In general, the DNA 
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sequence is inserted into am appropriate restriction 
endonuclease site(s) by procedures known in the art. Such 
procedures and others are deemed to be within the scope of 
those skilled in the art. 

The DNA sequence in the e3q)ression vector is operatively 
linked to an appropriate expression control sequence (s) 
(promoter) to direct mRNA synthesis. As representative 
examples of such promoters, there may be mentioned: LTR or 
SV4 0 promoter, the coli. lac or trp. the phage lambda P, 
promoter and other promoters known to control expression of 
genes in prokaryotic or eu3caryotic cells or their viruses. 
The expression vector also contains a ribosome binding site 
for translation initiation and a transcription terminator. 
The vector may also include appropriate sequences for 
amplifying expression. 

In addition, the e3qpression vectors preferably contain 
one or more selectable marker genes to provide a phenotypic 
trait for selection of transformed host cells such as 
dihydrof olate reductase or neomycin resistcuice for eukairyotic 
cell culture, or such as tetracycline or ampicillin 
resistance in E. coli . 

The vector containing the appropriate DNA sequence as 
hereinabove described, as well as an appropriate promoter or 
control sequence, may be ettployed to transform an appropriate 
host to permit the host to express the proteins. 

As representative exair^jles of appropriate hosts, there 
may be mentioned: bacterial cells, such as E. coli . 
Streptomvces . Salmonella typhimurium ; fungal cells, such as 
yeast; insect cells such as Drosophila S2 and Spodoptera Sf 9 ; 
animal cells such as C3i0, COS. or Bowes melanoma; 
adenoviruses; plant cells, etc. The selection of an 
appropriate host is deemed to be within the scope of those 
skilled in the art from the teachings herein. 

More particularly, the present invention also includes 
recombinant constructs conprising one or more of the 
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sequences as broadly described above. The constructs 
con?>rise a vector, such as a plasmid or viral vector, into 
which a secjuence of the invention has been inserted, in a 
forward or reverse orientation. In a preferred aspect of 
this embodiment, the construct further conprises regulatory 
sequences, including, for example, a promoter, operably 
linked to the sequence. Large numbers of suitable vectors 
and promoters are known to those of skill in the art, and are 
commercially available. The following vectors are provided 
by way of example. Bacterial: pQE70, pQEGO, pQE-9 (Qiagen, 
Inc.), pbs, pDlO, phagescript, psiX174, pbluescript SK, 
pbsks, pNHSA, pNHlGa, pNHlSA, pNH46A (Stratagene) ; ptrc99a, 
PKK223-3, PKK233-3, pDR540, pRIT5 (Pharmacia). Bukaryotic: 
pWLNEO, PSV2CAT, pOG44, pXTi, pSG (Stratagene) pSVK3, pBPV, 
pMSG, pSVL (Pharmacia) . However, any other plasmid or vector 
may be used as long as they are replicable and viable in the 
host . 

Promoter regions can be selected from any desired gene 
using CAT (chlorati^^henicol transferase) vectors or other 
vectors with selectable markers . Two appropriate vectors are 
pKK232-8 and pCM7. Particular named bacterial promoters 
include lad, lacZ, T3, T7, gpt, lambda Pr, P,, and TRP. 
Eukaryotic promoters include CMV immediate early, HSV 
thymidine kinase, early and late SV40, LTRs from retrovirus, 
and mouse metallothionein-l . Selection of the appropriate 
vector and promoter is well within the level of ordinary 
skill in the art. 

In a further embodiment, the present invention relates 
to host cells containing the above -described constructs. The 
host cell can be a higher eukaryotic cell, such as a 
mammalian cell, or a lower eukaryotic cell, such as a yeast 
cell, or the host cell can be a prokaryotic cell, such as a 
bacterial cell, introduction of the construct into the host 
cell can be effected by calcium phosphate transf ection, DEAE- 
Dextran mediated transf ection, or electroporation (Davis, L., 
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Dibner, M. , Battey, I., Basic Methods in Molecular Biology, 
(1986) ) . 

The constructs in host cells can be used in a 
conventional manner to produce the gene product encoded by 
the recombinant sequence. Alternatively, the polypeptides of 
the invention can be synthetically produced by conventional 
peptide synthesizers. 

Mature proteins can be e^qpressed in mammalian cells, 
yeast, bacteria, or other cells under the control of 
appropriate promoters. Cell-free translation systems can 
also be employed to produce such proteins using RNAs derived 
from the DNA constructs of the present invention. 
Appropriate cloning and expression vectors for use with 
prokaryotic and eukaryotic hosts are described by Sambrook, 
et al.i Molecular Cloning: A Laboratory Manual, Second 
Edition, Cold Spring Harbor, N.Y., (1989), the disclosure of 
which is hereby incorporated by reference. 

Transcription of the DNA encoding the polypeptides of 
the present invention by higher eukaryotes is increased by 
inserting an enhancer sequence into the vector. Enhancers 
are cis -acting elements of DNA, usually about from 10 to 300 
bp that act on a promoter to increase its transcription. 
Exaitples including the SV40 enhancer on the late side of the 
replication origin bp 100 to 270, a cytomegalovirus early 
promoter enhancer, the polyoma enhancer on the late side of 
the replication origin, and adenovirus enhancers. 

Generally, recombinant expression vectors will include 
origins of replication and selectable markers permitting 
transformation of the host cell, e.g., the an5)icillin 
resistance gene of E. coli and s. cerevisiae TRPl gene, and 
a promoter derived from a highly -escpressed gene to direct 
transcription of a downstream structural sequence. Such 
promoters can be derived from operons encoding glycolytic 
enzymes such as 3 -phosphoglycerate kinase (PGK) , a-f actor, 
acid phosphatase, or heat shock proteins, among others. The 
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heterologous structural sequence is assembled in appropriate 
phase with translation initiation and terniination sequences, 
optionally, the heterologous sequence can encode a fusion 
protein including an N-terminal identification peptide 
imparting desired characteristics, e.g., stabilization or 
simplified purification of expressed recombinant product. 

useful expression vectors for bacterial use are 
constructed by inserting a structural DNA sequence encoding 
a desired protein together with suitable translation 
initiation and termination signals in operable reading phase 
with a functional promoter. The vector will con?>rise one or 
more phenotypic selectable markers and an origin of 
replication to ensure maintenance of the vector and to, if 
desirable, provide an?)lif ication within the host. Suitable 

prokaryotic hosts for transformation include E, coli. 

Bacillus subtilis . Salmonella tvphimurium and various species 
within the genera Pseudomonas, Streptomyces , and 
Staphylococcus, although others may also be employed as a 
matter of choice. 

AS a representative but nonlimiting example, useful 
expression vectors for bacterial use can comprise a 
selectable marker and bacterial origin of replication derived 
from commercially available plasmids comprising genetic 
elements of the well kno%m cloning vector pBR322 (ATCC 
37017) . Such commercial vectors include, for example, 
pKK223-3 (Pharmacia Fine Chemicals, Uppsala, Sweden) and GEMi 
(Promega Biotec, Madison, WI, USA). These pBR3 22 "backbone" 
sections are combined with an appropriate promoter and the 
structural sequence to be expressed. 

Following transformation of a suitable host strain and 
growth of the host strain to an appropriate cell density, the 
selected promoter is induced by appropriate means (e.g., 
temperature shift or chemical induction) and cells are 
cultured for an additional period. 
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Cells are typically harvested by cent rifugat ion, 
disrupted by physical or chemical means, and the resulting 
crude extract retained for further purification. 

Microbial cells enployed in expression of proteins can 
be disrupted by any convenient method, including f reeze-thaw 
cycling, sonication, mechanical disruption, or use of cell 
lysing agents, such methods are well know to those skilled in 
the art . 

Various mammalian cell culture systems can also be 
employed to express recombinant protein. Examples of 
mammalian expression systems include the COS -7 lines of 
monkey kidney fibroblasts, described by Gluzman, Cell, 23:175 
(1981) , and other cell lines capable of expressing a 
compatible vector, for exanple, the C127, 3T3, CHO, HeLa and 
BHK cell lines. Mammalian expression vectors will comprise 
an origin of replication, a suitable promoter and enhancer, 
and also any necessary ribosome binding sites, 
polyadenylation site, splice donor and acceptor sites, 
transcriptional termination sequences, and 5' flanking 
nontranscribed sequences. DNA seq[uences derived from the 
SV40 splice, and polyadenylation sites may be used to provide 
the required nontranscribed genetic elements , 

The polypeptides can be recovered and purified from 
recombinant cell cultures by methods including ammonium 
sulfate or ethanol precipitation, acid extraction, anion or 
cation exchange chromatography, phosphocellulose 
chromatography, hydrophobic interaction chromatography, 
affinity chromatography, hydroxylapatite chromatography and 
lectin chromatography; Protein refolding steps can be used, 
as necessary, in completing configuration of the mature 
protein. Finally, high performance liquid chromatography 
(HPLC) can be enployed for final purification steps. 

The polypeptides of the present invention may be a 
naturally purified product, or a product of chemical 
synthetic procedures, or produced by recombinant techniques 
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from a prokaryotic or eukaryotic host (for exanple, by 
bacterial, yeast, higher plant, insect and mainraalian cells in 
culture) . Depending upon the host employed in a recombinant 
production procedure, the polypeptides of the present 
invention may be glycosylated or may be non-glycosylated. 

In accordance with a further aspect of the invention, 
there is provided a process for determining susceptibility to 
ceuicer, in particular, a hereditary cancer. Thus, a mutation 
in a human repair protein, which is a human homolog of mutL, 
and in particular those described herein, indicates a 
susceptibility to cancer, and the nucleic acid sequences 
encoding such human homologs may be employed in an assay f or 
ascertaining such susceptibility- Thus, for exan5)le, the 
assay may be en^loyed to determine a mutation in a human DNA 
repair protein as herein described, such as a deletion, 
truncation, insertion, frame shift, etc., with such mutation 
being indicative of a susceptibility to cancer. 

A mutation may be ascertained for exatr5)le, by a DNA 
sequencing assay. Tissue samples, including but not limited 
to blood san?)les are obtained from a human patient. The 
sair^les are processed by methods known in the art to capture 
the RNA. First strand cDNA is synthesized from the RNA 
samples by adding an oligonucleotide primer consisting of 
polythymidine residues which hybridize to the polyadenosine 
stretch present on the mRNA's. Reverse transcriptase and 
deoxynucleotides are added to allow synthesis of the first 
strand cDNA. Primer sequences are synthesized based on the 
DNA sequence of the DNA repair protein of the invention. The 
primer sec[uence is generally comprised of 15 to 30 and 
preferably from 18 to 25 consecutive bases of the human DNA 
repair gene. Table 1 sets forth an illustrative exairple of 
oligonucleotide primer sequences based on hMLHl. The primers 
are used in pairs (one "sense" strand and one "anti-sense") 
to amplify the cDNA from the patients by the PGR method 
(Saiki et ai., Nature, 324:163-166 (1986)) such that three 
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overlapping fragments of the patient's cDNA's for such 
protein are generated. Table 1 also shows a list of 
preferred primer sequence pairs. The overlapping fragments 
are then subjected to dideoxynucleotide sequencing using a 
set of primer sequences synthesized to correspond to the base 
pairs of the cDNA's at a point approximately every 200 base 
pairs throughout the gene. 
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TABLE 1 

Primer Sequences used to amplify gene region using PGR 

Start Site 



Name 


and Arrangement 


Seauence 


758 


sense- (-41) ' 


GTTGAACATCrAGACGTCTC 


1319 


sense -8 


TCGTGGCAGGGGTTATTCG 


1321 


sense-619 


CTACCCAATGCCTCAACCG 


1322 


sense-677 


GAGAACTGATAGAAATTGGATG 


1314 


sense-1548 


GGGACATGAGGTTCTCCG 


1323 


sense-1593 


GGGCTGTGTGAATCCTCAG 


773 


anti-53 


CX^GPTTCACCACTGTCrCGTC 


1313 


anti-971 


TCCAGGATGCTCTCCTCG 


1320 


anti^l057 


CAAGTCCTGGTAGCAAAGTC 


1315 


anti-1760 


ATGGCAAGGTCAAAGAGCG 


1316 


anti-1837 


CAACAATGTATTCAGXAAGTCC 


1317 


anti-2340 


TTGATACAACACrri'GTATCG 


1318 


anti-2415 


GGAATACTATCAGAAGGCAAG 



* Numbers corresponding to location along nucleotide 
sequence of Figure l where ATG is number 1. 
Preferred primer sequences pairs: 

758, 1313 
1319, 1320 
660, 1909 
725, 1995 
1680, 2536 
1727, 2610 

The nucleotide sec[uences shown in Table 1 represent SEQ ID 
No. 7 through 19, respectively. 
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Table 2 lists representative examples of 
oligonucleotide primer sequences < sense and anti- sense) 
which may be used, and preferably the entire set of primer 
sequences are used for sequencing to determine where a 
mutation in the patient DNA repair protein may be. The 
primer sequences may be from 15 to 30 bases in length and 
are preferably between 18 and 25 bases in length. The 
sequence information determined from the patient is then 
compared to non-mutated sequences to determine if any 
mutations are present. 

TABLE 2 

Primer Sequences Used to Sequence the Ampl ified Fragments 
Start Site 

Name Number and Arrangement Sequence 

ACAGAGCAAGTTACTCAGATG 
GTACACAATGCAGGCATTAG 
AATGTGGATGTTAATGTGCAC 
CTGACCTCXSTCTrCCTAC 
CAGCAAGATGAGGAGATGC 
GGAAATGGTGGAAGATGATTC 
CrrCTCAACACCAAGC 
GAAATTGATGAGGAAGGGAAC 
CTTCTGATTGACAACTATGTGC 
CACAGAAGATGGAAATATCCTG 
GTGTTGGTAGCACTTAAGAC 
TTTCCCATATTCTTCACTTG 
GTAACATGAGCCACATGGC 
CCACTGTCrCGTCCAGCCG 

* Numbers corresponding to location along nucleotide 
sequence of Figure 1 where ATG is number 1. 

The nucleotide sequences shown in Table 2 represent SEQ ID 
No. 20 through 33, respectively. 

In another embodiment, the primer sequences from Table 



5282 


seqOl 


sense 


-377* 


5283 


seq02 


sense 


-552 


5284 


seq03 


sense 


-904 


5285 


seq04 


sense 


-1096 


5286 


seqOS 


sense 


-1276 


5287 


seq06 


sense 


-1437 


5288 


seq07 


sense 


-1645 


5289 


seqOB 


sense 


-1895 


5295 


seq0 9 


sense 


-1921 


5294 


seqlO 


sense 


-2202 


5293 


seqll 


sense 


-2370 


5291 


8eqi2 


anti- 


525 


5290 


seql3 


anti- 


341 


5292 


seql4 


anti- 


46 
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2 could be used in the PGR method to anplif y a mutated 
region. The region could be sequenced and used as a 
diagnostic to predict a predisposition to such mutated 
genes . 

Alternatively, the assay to detect mutations in the 
genes of the present invention may be performed by genetic 
testing based on DNA. sequence differences achieved by 
detection of alteration in electrophoretic mobility of dna 
fragments in gels with or without denaturing agents. Small 
sequence deletions and insertions caui be visualized by high 
resolution gel electrophoresis. DNA fragments of different 
sequences may be distinguished on denaturing f ormamide 
gradient gels in which the mobilities of different DNA 
fragments are retarded in the gel at different positions 
according to their specific melting or partial melting 
temperatures (see, e.g., Myers etai.. Science, 230:1242 

(1985) ) . 

Sec[uence changes at specific locations may also be 
revealed by nuclease protection assays, such as RNase and 
Si protection or the chemical cleavage method (e.g., Cotton 
et al., PNAS, USA, 85:4397-4401 (1985)). Perfectly matched 
sequences can be distinguished from mismatched duplexes by 
RNase A digestion or by differences in melting 
temperatures . 

Thus, the detection of a specific DNA sequence may be 
achieved by methods such as hybridization, RNase 
protection, chemical cleavage, Western Blot analysis, 
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direct dna sequencing or the use of restriction enzymes, 
(e.g., Restriction Fragment Length polymorphisms (RFLP) ) 
and Southern blotting of genomic DNA. 

In addition to more conventional gel -electrophoresis 
and DNA sequencing, mutations cam also be detected by in 
situ analysis. 

The polypeptides may also be eii?>loyed to treat cancers 
or to prevent cancers, by expression of such polypeptides 
in vivo, which is often referred to as "gene therapy." 

Thus, for example, cells from a patient may be 
engineered with a polynucleotide (DNA or RNA) encoding a 
polypeptide ex vivo, with the engineered cells then being 
provided to a patient to be treated with the polypeptide. 
Such methods are well-known in the art. For example, cells 
may be engineered by procedures known in the art by use of 
a retroviral particle containing RNA encoding a polypeptide 
of the present invention. 

Similarly, cells may be engineered in vivo for 
expression of a polypeptide in vivo by, for exaiqple, 
procedures known in the art. As known in the art, a 
producer cell for producing a retroviral particle 
containing RNA encoding the polypeptide of the present 
invention may be administered to a patient for engineering 
cells in vivo and expression of the polypeptide in vivo. 
These and other methods for administering a polypeptide of 
the present invention by such method should be apparent to 
those skilled in the art from the teachings of the present 
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invention. For example, the expression vehicle for 
engineering cells may be other than a retrovirus, for 
example, an adenovirus which may be used to engineer cells 
in vivo after combination with a suitable delivery vehicle. 

Each of the cDNA sequences identified herein or a 
portion thereof can be used in numerous ways as 
polynucleotide reagents. The sequences can be used as 
diagnostic probes for the presence of a specific mRNA in a 
particular cell type. In addition, these sequences can be 
used as diagnostic probes suitable for use in genetic 
linkage analysis (polymorphisms) . 

The sequences of the present invention are also 
valuable for chromosome identification- The sequence is 
specifically targeted to and can hybridize with a 
particular location on an individual human chromosome. 
Moreover, there is a current need for identifying 
particular sites on the chromosome. Few chromosome marking 
reagents based on actual sequence data (repeat 
polymorphisms) are presently available for marking 
chromosomal location. The mapping of DNAs to chromosomes 
according to the present invention is an important first 
step in correlating those sequences with genes associated 
with disease. 

Briefly, sequences can be mapped to chromosomes by 
preparing PGR primers (preferably 15-25 bp) from the cDNA. 
Computer analysis of the 3' untranslated region is used to 
rapidly select primers that do not span more than one exon 
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in the genomic DNA, thus coraplicating the amplification 
process. These primers are then used for PGR screening of 
somatic cell hybrids containing individual human 
chromosomes . Only those hybrids containing the human gene 
corresponding to the primer will yield an amplified 
fragment . 

PGR mapping of somatic cell hybrids is a rapid 
procedure for assigning a particular DNA to a particular 
chromosome. Using the present invention with the same 
oligonucleotide primers, sublocalization can be achieved 
with panels of fragments from specific chromosomes or pools 
of large genomic clones in an analogous manner. Other 
mapping strategies that can similarly be used to map to its 
chromosome include in situ hybridization, prescreening with 
labeled flow-sorted chromosomes and preselection by 
hybridization to construct chromosome-specific cDNA 
libraries . 

Fluorescence in situ hybridization (FISH) of a cDMA 
clone to a metaphase chromosomal spread can be used to 
provide a precise chromosomal location in one step. This 
technique can be used with cDNA as short as 500 or 600 
bases; however, clones larger than that have a higher 
likelihood of binding to a unique chromosomal location with 
sufficient signal intensity for sin5)le detection. FISH 
requires use of the clones from which the express sequence 
tag or EST was derived, and the longer the better. For 
example, 2,000 bp is good, 4,000 is better, and more than 
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4,000 is probably not necessairy to get good results a 
reasonable percentage of the time. For a review of this 
technique, see Venna et al*, Human Chromosomes: a Manual 
of Basic Techniques, Pergamon Press, New York (1988) . 

once a sequence has been mapped to a precise 
chromosomal location, the physical position of the sequence 
on the chromosome can be correlated with genetic map data. 
Such data are found, for example, in V. McKusick, Mendelian 
Inheritance in Man (available on line through Johns Hopkins 
University Welch Medical Library) . The relationship 
between genes and diseases that have been mapped to the 
same chromosomal region are then identified through linkage 
analysis (coinheritance of physically adjacent genes) . 

Next, it is necessary to determine the differences in 
the cDNA or genomic sequence between affected and 
unaffected individuals. If a mutation is observed in some 
or all of the affected individuals but not in any normal 
individuals, then the mutation is likely to be the 
causative agent of the disease. 

With current resolution of physical mapping and 
genetic mapping techniques, a cDNA precisely localized to a 
chromosomal region associated with the disease could be one 
of between 50 and 500 potential causative genes. (This 
assumes l megabase mapping resolution and one gene per 20 
kb) . 

hMIjH2 has been localized using a genomic Pi clone 
(1670) which contained the 5' region of the hMLH2 gene. 
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Detailed analysis of human metaphase chromosome spreads, 
counterstained to reveal banding, indicated that the hMLH2 
gene was located within bands 2q32 . Likewise, hMLHS was 
localized using a genomic Pi clone (2053) which contained 
the 3' region of the hMLHS gene. Detailed analysis of 
human metaphase chromosome spreads, counterstained to 
reveal banding, indicated that the hMLH3 gene was located 
within band 7p22, the most distal band on chromosome 7. 
Analysis with a variety of genomic clones showed that hMLH3 
was a member of a subfamily of related genes, all on 
chromosome 7. 

The polypeptides, their fragments or other 
derivatives, or analogs thereof, or cells expressing them 
can be used as an immunogen to produce cuitibodies thereto. 
These antibodies can be, for exainple, polyclonal or 
monoclonal antibodies. The present invention also includes 
chimeric, single chain, and humanized antibodies, as well 
as Fab fragments,, or the product of an Fab expression 
library. Various procedures known in the art may be used 
for the production of such antibodies and fragments. 

Antibodies generated against the polypeptides 
corresponding to a sequence of the present invention Ccui be 
obtained by direct injection of the polypeptides into an 
animal or by administering the polypeptides to an auiimal, 
preferably a nonhuman. The antibody so obtained will then 
bind the polypeptides itself. In this manner, even a 
sequence encoding only a fragment of the polypeptides can 
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be used to generate antibodies binding the whole native 
polypeptides. Such antibodies can then be used to isolate 
the polypeptide from tissue expressing that polypeptide. 

For preparation of monoclonal antibodies, any 
technique which provides antibodies produced by continuous 
cell line cultures can be used. Bxaii?)les include the 
hybridoma technique (Kohler and Milstein, 1975, Nature, 
256:495-497), the trioma technique, the humcin B-cell 
hybridoma technique (Kozbor et al., 1983, Immunology Today 
4:72), and the EBV-hybridoma technique to produce human 
monoclonal antibodies (Cole, et al., 1985, in Monoclonal 
Antibodies and Cancer Therapy, Alan R. Liss, Inc., pp. 77- 
96) . 

Techniques described for the production of single 
chain antibodies (U.S. Patent 4,946,778) cam be adapted to 
produce single chain antibodies to immunogenic polypeptide 
products of this invention. Also, transgenic mice may be 
used to express humanized antibodies to immunogenic 
polypeptide products of this invention. 

The present invention will be further described with 
reference to the following examples,- however, it is to be 
understood that the present invention is not limited to 
such exait^les. All parts or amounts, unless otherwise 
specified, are by weight. 

In order to facilitate understanding of the following 
exatiples certain frequently occurring methods and/or terms 
will be described. 
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"Plasmids" are designated by a lower case p preceded 
and/or followed by capital letters and/or numbers. The 
starting plasmids herein are either commercially available, 
publicly available on an unrestricted basis, or can be 
constructed from available plasmids in accord with 
published procedures. In addition, equivalent plasmids to 
those described are known in the art and will be apparent 
to the ordinarily skilled artisan. 

"Digestion" of DNA refers to catalytic cleavage of the 
DNA with a restriction enzyme that acts only at certain 
sequences in the DNA. The various restriction enzymes used 
herein are commercially available and their reaction 
conditions, cof actors and other requirements were used as 
would be known to the ordinarily skilled artisan. For 
analytical purposes, typically 1 fig of plasmid or DNA 
fragment is used with about 2 units of enzyme in about 20 
ptl of buffer solution. For the purpose of isolating DNA 
fragments for plasmid construction, typically 5 to 50 /xg of 
DNA are digested with 20 to 250 units of enzyme in a 
larger volume. Appropriate buffers smd substrate amounts 
for particular restriction enzymes are specified by the 
manufacturer. Incubation times of about l hour at 37 are 
ordinarily used, but may vary in accordance with the 
supplier's instructions. After digestion the reaction is 
electrophoresed directly on a polyacrylamide gel to isolate 
the desired fragment. 
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Size separation of the cleaved fragments is performed 
using 8 percent polyacrylamide gel described by Goeddel, D. 
et al., Nucleic Acids Res., 8:4057 (1980). 

"Oligonucleotides" refers to either a single stranded 
polydeoxynucleotide or two con5>lementary 
polydeoxynucleotide strands which may be chemically 
synthesized. Such synthetic oligonucleotides have no 5' 
phosphate and thus will not ligate to another 
oligonucleotide without adding a phosphate with an ATP in 
the presence of a kinase. A synthetic oligonucleotide will 
ligate to a fragment that has not been dephosphorylated. 

"Ligation" refers to the process of forming 
phosphodiester bonds between two double stranded nucleic 
acid fragments (Maniatis, T. , et al., Id., p. 146). Unless 
otherwise provided, ligation may be accomplished using 
known buffers and conditions with 10 units to T4 DNA ligase 
{"ligase") per 0.5 ng of approximately equimolar amounts of 
the DNA fragments to be ligated. 

Unless otherwise stated, transformation was performed 
as described in the method of Graham, F. and Van der Eb, 
A., Virology, 52:456-457 (1973). 

Example 1 
Bacterial Expression of hMLHl 

The full length DNA sequence encoding hximan DNA 
mismatch repair protein hMLHl, ATCC # 75649, is initially 
amplified using PGR oligonucleotide primers corresponding 
to the 5' and 3' ends of the DNA sequence to synthesize 
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insertion fragments. The 5' oligonucleotide primer has the 
sequence 5' CGGGATCCATGTCGTTCGTGGCAGGG 3' (SBQ ID No. 34), 
contains a BamHI restriction enzyme site followed by 18 
nucleotides of hMIiHl coding sequence following the 
initiation codon; the 3' sequence 5' GCTCTAGATTAACACCTCT 
CAAAGAC 3' (SEQ ID No, 35) contains complementary sequences 
to an Xbal site and is at the end of the gene. The 
restriction enzyme sites correspond to the restriction 
enzyme sites on the bacterial expression vector pQE-9, 
{Qiagen, inc, , Chatsworth, CA) . The plasmid vector encodes 
antibiotic resistance (Amp') , a bacterial origin of 
replication (ori) , an iPTG-regulatable promoter/operator 
(P/O) , a ribosome binding site (RBS) , a 6-histidine tag {6- 
His) and restriction enzyme cloning sites. The pQE-9 
vector is digested with BamHI and Xbal and the insertion 
fragments are then ligated into the pQE-9 vector 
maintaining the reading frame initiated at the bacterial 
RBS. The ligation mixture is then used to transform the E, 
coli strain M15/rep4 (Qiagen, Inc.) which contains multiple 
copies of the plasmid pREP4, which expresses the lad 
repressor and also confers kanamycin resistcuice (Kan') . 
Transformants are identified by their ability to grow on LB 
plates and ampicillin/kanamycin resistant colonies are 
selected. Plasmid DNA is isolated and confirmed by 
restriction analysis. Clones containing the desired 
constructs are grown overnight (0/N) in liquid culture in 
LB media supplemented with both Amp (100 ug/ml) and Kan (25 
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ug/ml) . Tho 0/N culture is used to inoculate a large 
culture at a ratio of 1:100 to 1:250. The cells are grown 
to an optical density 600 (O.D.'"') of between 0.4 and 0.6. 
IPTG (isopropyl-B-D-thiogalacto pyranoside) is then added 
to a final concentration of 1 niM. IPTG induces by 
inactivating the lad repressor, clearing the P/0 leading 
to increased gene es^ression. Cells are grown an extra 3 
to 4 hours, cells are then harvested by centrifugation (20 
mins at 6000Xg) . The cell pellet is solubilized in the 
chaotropic agent 6 Molar Guanidine HCl. After 
clarification, solubilized hMLHl is purified from this 
solution by chromatography on a Nickel -Chelate column under 
conditions that allow for tight binding by proteins 
containing the 6 -His tag (Hochuli, E. et al., Genetic 
Engineering, Principles & Methods, 12:87-98 (1990). 
Protein renaturation out of GnHCl can be acconplished by 
several protocols (Jaenicke, R. and Rudolph, R. , Protein 
Structure - A Practical Approach, IRL press. New York 
(1990)). initially, step dialysis is utilized to remove 
the GnHCL. Alternatively, the purified protein isolated 
from the Ni- chelate column can be bound to a second column 
over which a decreasing linear GnHCL gradient is nm. The 
protein is allowed to renature while bound to the colintm 
and is subsequently eluted with a buffer containing 250 mM 
imidazole, 150 mM NaCl, 25 mM Tris-HCl pH 7.5 and 10% 
Glycerol. Finally, soluble protein is dialyzed against a 
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Storage buffer containing 5 TOM Ammonium Bicarbonate. The 
purified protein was analyzed by SDS-PAGE. 



Example 2 

Spontaneous Mutation Assav for De tection of the Expression 
of hMLHl. hMLH2 and hMLH3 and Complement ation to the E.coli 
mutl 

The pQE9hMLHl, pQE9hMLH2 or pQE9hMLH3/GW3733 , 
transformants were subjected to the spontaneous mutation 
assay. The plasmid vector pQE9 was also transformed to 
AB1157 (k~l2, argE3 hlsG4,LeuB6 proA2 thr-1 ara-l 2T)sL31 
supE44 tsx-33) and GW3733 to use as the positive and 
negative control respectively. 

Fifteen 2 ml cultures, inoculated with approximately 
100 to 1000 E. coli, were grown 2x10* cells per ml in LB 
ampicillin medium at 37**C. Ten microliters of each culture 
were diluted and plated on the LB an5)icillin plates to 
measure the number of viable cells. The rest of the cells 
from each culture were then concentrated in saline and 
plated on minimal plates lacking of arginine to measure 
reversion of Arg* . In Table 3, the mean number of 
mutations per culture (m) was calculated from the medism 
number {r) of mutants per distribution, according to the 
equation (r/m) -In (m) = 1.24 (Lea et al., J. Genetics 
49:264-285 (1949)). Mutation rates per generation were 
recorded as m/N, with N representing the average number of 
cells per culture. 
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TABLE 3 



Spontaneous Mutation Rates 



Strain 



Mutation/generation 



ABii57+vector 



(5.6±0.1) X 10-9a 



GW3733+vector 



(1,1±0,2) X 10-6a 



GW3733+phMLHl 



(3.7±1.3 X 10-7a 



6W3733+phMIiH2 



(3.1±0.6> X 10-7b 



GW3733+phMLH3 



(2.1±0,8) X 10-7b 



a: Average of three experiments, 
b: Average of four experiments. 

The functional complementation result showed that the 
human mutL can partially rescue the E .coli xnutli mutator 
phenotype, suggesting that the human inutL is not only 
successfully expressed in a bacterial expression system, 
but also functions in bacteria. 



Chromosomal Mapping of the hMLHl 

An oligonucleotide primer set was designed according 
to the sequence at the 5' end of the cDNA for HMLHl. This 
primer set would span a 94 bp segment. This primer set was 
used in a polymerase chain reaction under the following set 
of conditions : 



Example 3 



-35- 



wo 95/20678 PCT/US95/01035 

30 seconds, 95 degrees C 

1 minute, 56 degrees C 

1 minute, 70 degrees C 
This cycle was repeated 32 times followed by one 5 minute 
cycle at 70 degrees C. Himian, mouse, and hamster DNA were 
used as template in addition to a somatic cell hybrid panel 
(Bios, mc) . The reactions were analyzed on either 8% 
polyacrylamide gels or 3.5 % agarose gels. A 94 base pair 
band was observed in the human genomic DNA sample and in 
the somatic cell hybrid sample corresponding to chromosome 
3. In addition, using various other somatic cell hybrid 
genomic DNA, the hMLHl gene was localized to chromosome 3p. 

Example 4 

Method for Determination of mutation of hMLHl gene in HNPCC 
kindred 

cDNA was produced from RNA obtained from tissue 
samples from persons who are HNPCC kindred and the cDNA was 
used as a template for PCR, employing the primers 5' GCATC 
TAGACGTTTCCITGGC 3' (SBQ ID No. 36) and 5' CATCCAAGCTTCTGT 
TCCCG 3' (SBQ ID No. 37), allowing anqplif ication of codons 
1 to 394 of Figure 1; 5' GGGGTGCAGCAGCACATCG 3' (SEQ ID No. 
38) and 5' GGAGGCAGAATGTGTGAGCG 3' (SEQ ID No. 39), 
allowing an^lif ication of codons 326 to 729 of Figure l 
(SEQ ID No. 2); and 5' TCCCAAAGAAGGACTTGCT 3' (SEQ ID No. 
40) and 5' AGTATAAGTCTTAAGTGCTACC 3' (SEQ ID No. 41), 
allowing an5)lif ication of codons 602 to 756 plus 128 nt of 
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3'- iintranslated sequences of 0^fa^0^(^^ - The 

PGR conditions for all analyses used consisted of 35 cycles 
at 95**C for 30 seconds, 52-58*»C for 60 to 120 seconds, and 
lO'^C for 60 to 120 seconds, in the buffer solution 
described in San Sidransky, D, et al., Science, 252:706 
(1991) . PGR products were sequenced using primers labeled 
at their 5' end with T4 polynucleotide kinase, enploying 
SequiThertn Polymerase (Epicentre Technologies) . The 
intron-exon borders of selected exons were also determined 
and genomic PGR products analyzed to confirm the results. 
PGR products harboring suspected mutations were then cloned 
and sequenced to validate the results of the direct 
sequencing. PGR products were cloned into T-tailed vectors 
as described in Helton, T.A. and Graham, M.W., Nucleic 
Acids Research, 19: 1156 (1991) and sequenced with T7 
polymerase (United states Biochemical) . Affected 
individuals from seven kindreds all exhibited a 
heterozygous deletion of codons 578 to 632 of the hMLHl 
gene. The derivation of five of these seven kindreds could 
be traced to a common ancestor. The genomic sequences 
surrounding codons 578-632 were determined by cycle- 
sequencing of the PI clones (a human genomic PI library 
which contains the entire hMLHl gene (Genome Systems) ) 
using SequiTherm Polymerase, as described by the 
manufacturer, with the primers were labeled with T4 
polynucleotide kinase, and by sequencing PGR products of 
genomic DNA. The primers used to amplify the exon 
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containing codons 578-632 were 5' TTTATGGTTrCTCACCTGCC 3' 
(SBQ ID NO. 42) and 5' GTTATCTGCCCACCTCAGC 3' (SEQ ID No. 
43) . The PGR product included 105 bp of intron C sequence 
upstream of the exon and 117 bp downstream. No mutations 
in the PGR product were observed in the kindreds, so the 
deletion in the RNA was not due to a sin5)le splice site 
mutation. Codons 578 to 632 were foxind to constitute a 
single exon which was deleted from the gene product in the 
kindreds described above. This exon contains several 
highly conserved amino acids. 

In a second family {L7) , PGR was performed using the 
above primers and a 4bp deletion was observed beginning at 
the first nucleotide (nt) of codon 727. This produced a 
frame shift with a new stop codon 166 nt downstream, 
resulting in a substitution of the carboxy-terminal 29 
amino acids of hMLHi with 53 different amino acids, some 
encoded by nt normally in the 3' untranslated region. 

A different mutation was found in a different kindred 
(Ii25l6) after PGR using the above primers, the mutation 
consisting of a 4bp insert between codons 755 and 756. 
This insertion resulted in a frame shift and extension of 
the ORF to include 102 nucleotides (34 amino acids) 
downstream of the normal termination codon. The mutations 
in both kindreds L7 and L2516 were therefore predicted to 
alter the C-terminus of hMLHl. 

A possible mutation in the hMLHl gene was determined 
from alterations in size of the encoded protein, where 
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kindreds were too few for linkage studies. The primers 
used for coupled transcription-translation of hMLHi were 5' 
GGATCCTTUlTACXacraVCTATAGGGAGACCACCATC 
AGACGTTTCCCTTGGC 3' (SEQ ID No. 44) and 5' 

CATCCAAGCTTCrGTTCCCXS 3' (SEQ ID No. 45) for codons 1 to 394 
of Figure l and 5' GKSATCCTAATAOaCTCACrATAGGGAGACCACCATGGG 
GGTGCAGCAGCACATCG 3' (SEQ ID No. 46) and 5' GGAGGCAGAATGTG 
TGAGCG 3' (SEQ ID No. 47) for codons 326 tO 729 of Figure 1 
(SEQ ID No. 2) . The resultant PGR products had signals for 
transcription by T7 RNA polymerase and for the initiation 
of translation at their 5 ' ends . RNA from lyraphoblastoid 
cells of patients from 18 kindreds was used to anqplify two 
products, extending from codon l to codon 394 or from codon 
326 to codon 729, respectively. The PGR products were then 
transcribed and translated In vitro, making use of 
transcription- translation signals incorporated into the PGR 
primers. PGR products were used as templates in coupled 
transcription-translation reactions performed as described 
by Powell, S.M. et al.. New England Journal of Medicine, 
329:1982, (1993), using 40 micro CI of ^^S labeled 
methionine. Samples were diluted in sample buffer, boiled 
for five minutes and analyzed by electropheresis on sodium 
dodecyl sulf ate-polyacrylamide gels containing a gradient 
of 10% to 20% acrylamide. The gels were dried and 
subjected to radiography. All san^les exhibited a 
polypeptide of the expected size, but an abnormally 
migrating polypeptide was additionally found in one case. 
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The sequence of the relevant PGR product was determined and 
found to include a 371 bp deletion beginning at the first 
nucleotide (nt) of codon 347, This alteration was present 
in heterozygous form, and resulted in a frame shift in a 
new stop codon 30 nt downstream of codon 346, thus 
explaining the truncated polypeptide observed. 

Four colorectal tumor cell lines manifesting 
microsatellite instability were examined. One of the four 
(cell line H6) showed no normal peptide in this assay and 
produced only a short product migrating at 27 kd. The 
sequence of the corresponding cDNA was determined and found 
to harbor a C to A transversion at codon 252, resulting in 
the substitution of a termination codon for serine. In 
accord with the translational analyses, no band at the 
normal C position was identified in the cDNA or genomic DNA 
from this tumor, indicating that it was devoid of a 
functional hMLHl gene. 

Table 4 sets forth the results of these sequencing 
assays. Deletions were found in those people who were 
known to have a family history of the colorectal cancer. 
More particularly, 9 of 10 families showed an hMLHl 
mutation . 
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Table 4 - Summary of Mutations in hMLHl 



Sample £odon 

Kindreds F2, F3 , F6, F8, 578-632 
FIO, Fll, F52 

Kindred L7 727/728 



Kindred L2516 



Kindred RA 



H6 Colorectal Tumor 



755/756 



347 



252 



cDNA Nucleotide 
Change 

165 bp deletion 

4 bp deletion 
(TCACACATTC to 
TCATTCT) 

4 bp insertion 
(GTGTTAA tO 
GTGTTTGTTAA ) 

371 bp deletion 

Transversion 
(TCA to TAA) 



Predicted 

In -frame 
deletion 




]&(tafiiai c£ C- 
terminus 

Frameshif t/ 
Truncation 

Serine to Step 



Example 5 

Bacterial Expression and Purification of hMLH2 

The DNA sequence encoding hMLH2, ATCC #75651, is 
initially amplified using PCR oligonucleotide primers 
corresponding to the 5' and 3' ends of the DNA sequence to 
synthesize insertion fragments. The 5' oligonucleotide 
primer has the sequence 5' CGGGATCCATGAAACAATTGCCTGCGGC 3' 
(SEQ ID No. 48) contains a BamHI restriction enzyme site 
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followed by 17 nucleotides of hMLH2 following the 
initiation codon. The 3' sequence 5' GCTCTAGACCAGACTCAT 
GCTGTTTT 3' (SEQ ID No. 49) contains complementary 
sequences to an Xbal site and is followed by 18 nucleotides 
of hMLH2 . The restriction enzyme sites correspond to the 
restriction enzyme sites on the bacterial e3qpression vector 
pQE-9 (Qiagen, Inc. Chatsworth, CA) . pQE-9 encodes 
antibiotic resistcuice (Amp') , a bacterial origin of 
replication (ori) , an IPTG-regulatable promoter operator 
(P/0) , a ribosome binding site (RBS) , a 6-His tag and 
restriction enzyme sites. The amplified sequences and pQE- 
9 are then digested with BamHl and Xbal. The anplified 
sequences are ligated into pQE-9 and are inserted in frame 
with the sequence encoding for the histidine tag and the 
RBS- The ligation mixture is then used to transform E^ 
coli strain M15/rep4 (Qiagen, Inc.) which contains multiple 
copies of the plasmid pREP4, which expresses the laci 
repressor and also confers kanamycin resistance (Kan') . 
Transf ormants are identified by their ability to grow on LB 
plates and anpicillin/kanamycin resistant colonies are 
selected. Plasmid DNA is isolated and confirmed by 
restriction analysis. Clones containing the desired 
constructs are grown overnight (0/N) in liquid culture in 
LB media supplemented with both Amp (100 ug/ml) and Kan (25 
ug/ml) , Tho 0/N culture is used to inoculate a large 
culture at a ratio of 1:100 to 1:250. The cells are grown 
to an optical density 600 (O.D.**^) of between 0.4 and 0.6. 
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IPTG (Isopropyl-B-D-thiogalacto pyrcuioside) is then added 
to a final concentration of l mM. IPTG induces by 
inactivating the lad repressor, clearing the P/0 leading 
to increased gene expression. Cells are grown an extra 3 
to 4 hours. Cells are then harvested by centrifugation (20 
mins at €OOOXg) . The cell pellet is solxibilized in the 
chaotropic agent 6 Molar Guanidine HCl. After 
clarification, solubilized hMLH2 is purified from this 
solution by chromatography on a Nickel -Chelate column under 
conditions that allow for tight binding by proteins 
containing the 6 -His tag (Hochuli, E. et al., Genetic 
Engineering, Principles & Methods, 12:87-98 (1990). 
Protein renaturation out of GnHCl can be accomplished by 
several protocols (Jaenicke, R. and Rudolph, R. , Protein 
Structure - A Practical Approach, IRL Press, New York 
(1990)). Initially, step dialysis is utilized to remove 
the GnHCL. Alternatively, the purified protein isolated 
from the Ni-chelate column can be bound to a second column 
over which a decreasing linear GnHCL gradient is run. The 
protein is allowed to renature while bound to the column 
and is subsequently eluted with a buffer containing 250 mM 
Imidazole, 150 mM NaCl. 25 mM Tris-HCl pH 7.5 and 10% 
Glycerol. Finally, solx±>le protein is dialyzed against a 
storage buffer containing 5 mM Ammonium Bicarbonate. The 
purified protein was analyzed by SDS-PAGE. 
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Example 6 

Bacterial Expression and Purification of hMLH3 

The DNA sequence encoding hMLH3, ATCC #75650, is 
initially attqplified using VCR oligonucleotide primers 
corresponding to the 5' and 3' ends of the DNA sequence to 
synthesize insertion fragments. The 5' oligonucleotide 
primer has the sequence 5' CGGGATCCATGGA6CGAGCTGAGA6C 3' 
(SEQ ID No. 50) contains a BamHI restriction enzyme site 
followed by 18 nucleotides of hMLH3 coding sequence 
starting from the presumed terminal amino acid of the 
processed protein. The 3' sec[uence 5' GCTCTAGAGTGAAG 
ACTCTGTCT 3' (SEQ ID No. 51) contains corr5}lementary 
sequences to an Xbal site and is followed by IB nucleotides 
of hMliH3. The restriction enzyme sites correspond to the 
restriction enzyme sites on the bacterial expression vector 
pQE-9 (Qiagen, inc. Chatsworth, CA) . pQE-9 encodes 
antibiotic resistance (Amp') , a bacterial origin of 
replication (ori) , an IPTG-regulatable promoter operator 
(P/O) , a ribosome binding site (RBS) , a 6 -His tag and 
restriction enzyme sites. The an^lified sequences and pQE- 
9 are then digested with BamHI and Xbal . The amplified 
sequences are ligated into pQE-9 and are inserted in frame 
with the sequence encoding for the histidine tag cuid the 
RBS. The ligation mixture was then used to trauisform E^ 
coli strain M15/rep4 (Qiagen, Inc.) which contains multiple 
copies of the plasmid pREP4, which expresses the lad 
repressor and also confers kanamycin resistance (Kan') . 
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Trans formants are identified by their ability to grow on LB 
plates and anpicillin/kanamycin resistant colonies are 
selected. Plasmid DNA is isolated cuid confirmed by 
restriction analysis. Clones containing the desired 
constructs are grown overnight (0/N) in liquid culture in 
LB media supplemented with both Aaxp (100 ug/ml) and Kan (25 
ug/ml) . Tho 0/N culture is used to inoculate a large 
culture at a ratio of 1:100 to 1:250. The cells are grown 
to an optical density 600 (O.D.*^) of between 0.4 and 0.6. 
IPTG (isopropyl-B-D-thiogalacto pyranoside) is then added 
to a final concentration of 1 mN. IPTG induces by 
inactivating the lad repressor, clearing the P/0 leading 
to increased gene expression. Cells are grown an extra 3 
to 4 hours. Cells are then harvested by centrifugation (20 
mins at 6000Xg) . The cell pellet is solubilized in the 
chaotropic agent 6 Molar Guanidine HCl, After 
clarification, solubilized stanniocalcin is purified from 
this solution by chromatography on a Nickel -Chelate column 
under conditions that allow for tight binding by proteins 
containing the 6 -His tag (Hochuli, E. et al., Genetic 
Engineering, Principles & Methods, 12:87-98 (1990). 
Protein renaturation out of GnHCl can be accortplished by 
several protocols (Jaenicke, R. and Rudolph, R. , Protein 
Structure - A Practical Approach, IRL Press, New York 
(1990)). Initially, step dialysis is utilized to remove 
the GnHCL. Alternatively, the purified protein isolated 
from the Ni -chelate column can be bound to a second column 
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over which a decreasing linear GnHCL gradient is run. The 
protein is allowed to renature while boiind to the column 
and is sxibsequently eluted with a buffer containing 250 mM 
imidazole, 150 mM NaCl, 25 tnM Tris-HCl pH 7.5 and 10% 
Glycerol. Finally, soluble protein is dialyzed against a 
storage buffer containing 5 itiM Ammonium Bicarbonate, The 
purified protein was analyzed by SDS-PAGE. 

Example 7 

Method for determination of mutation of hMLH2 and hMIiH3 in 
hereditary cancer 
Isolation of Genomic Clones 

A human genomic Pi library {Genomic Systems, Inc.) was 
screened by PGR using primers selected for the cDNA 
sequence of hMIiH2 and hMLH3. Two clones were isolated for 
hMLH2 using primers 5' AAGCTGCTCTGTTAAAAGCG 3' (SEQ ID No. 
52) and 5' GCACCAGCATCCAAGGAG 3' (SEQ ID No . 53) and 
resulting in a 133 bp product. Three clones were isolated 
for hMLH3, using primers 5' CAACCATGAGACACATCGC 3' (SEQ ID 
No. 54) and 5' AGGTTAGTGAAGACTCT(3TC 3' (SEQ ID NO. 55) 
resulting in a 121 bp product. Genomic clones were nick- 
translated with digoxigenindeoxy -uridine 5 ' -triphosphate 
(Boehringer Manheim) , and FISH was performed as described 
(Johnson, Cg. et al., Methods Cell Biol., 35:73-99 (1991)). 
Hybridization with the hMLH3 probe were carried out using a 
vast excess of human cot-l DNA for specific hybridization 
to the expressed hMLH3 locus. Chromosomes were 
counterstained with 4 , 6-diamino-2-phenylidole andpropidium 
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iodide, producing a combination of C- and R-bands. Aligned 
images for precise mapping were obtained using a triple- 
band filter set (Chroma Technology, Brattleboro, VT) in 
combination with a cooled charge -coupled device camera 
(Photometries, Tucson, AZ) and variable excitation 
wavelength filters (Johnson, Cv. et al.. Genet. Anal. Tech. 
Appl., 8:75 (1991)). Image collection, analysis and 
chromosomal fractional length measiirements were done suing 
the ISee Graphical Program System (Inovision Corporation, 
Durham, NC) . 

Transcription coupled Translation Mutation Analysis 

For purposes of IVSP analysis the hMLH2 gene was 
divided into three overlapping segments. The first segment 
included codons l to 500, while the middle segment included 
codons 270 to 755, and the last segment included codons 485 
to the translational termination site at codon 933. The 
primers for the first segment were 5' GGATCCTAATACGACTCACT 
ATAGGGAGACCACCATGGAACAATTGCCTGCXSG 3' (SEQ ID No. 56) and 5' 
CCTGCTCCACTCATCTGC 3' (SEQ ID No. 57), for the middle 
segment were 5' GGATCCTAATACGACTCACTATAGGGA(3ACCACCaiTGGAAGA 
TATCTTAAAGTTAATCCG 3' (SEQ ID No. 58) and 5' GGCTTCTTCTACTC 
TATATGG 3' (SEQ ID No. 59), and for the final segment were 
5 ' GGATCCTAATACGACTCACTATAGGGAGACCACCATGGCAGGTCTTGAAAACTC 
TTCG 3' (SEQ ID No. 60) and 5/ AAAACAAGTCAGTGAATCCTC 3' 
(SEQ ID NO. 61) . The primers used for mapping the stop 
mutation in patient CW all used the same 5' primer as the 
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first segment. The 3' nested primers were: 5' 
JUVGCACATCTOTTTCTGCTG 3' (SEQ ID No. 62) codons 1 to 369; 5' 
ACGAGTAGATTCCTTTAGGC 3' (SEQ ID No. 63) codons 1 to 290; 
and 5' CAGAACTGACATGAGAGCC 3' (SEQ ID No. 64) COdons 1 to 
214. 

For analysis of hMLHS, the hMIiH3 cDNA was an5)lified as 
a full-length product or as two overlapping segments. The 
primers for full-length hMLH3 were 5' 

GGATCCTAATACGACTCACTATAGGGAGACCACCATGCSAGCGAGCTK^ 3 ' 

(SEQ ID No. 65) and 5' AGGTTAGTGAAGACTCTGTC 3' (SEQ ID No. 
66) (codons 1 to 863) . For segment l, the sense primer was 
the same as above and the antisense primer was 5' CTGAGGTCT 
CAGCAGGC 3' (SEQ ID No. 67) (codons 1 to 472) . Segment 2 
primers were 5' G(3ATCCTAATACGACrrCACrATAGG<3A(SACCACCATGGTGTC 
CATTTCCAGACTGCG 3' (SEQ ID No. 68) and 5' AGGTTAGTGAAGACTCT 
GTC 3' (SEQ ID NO. 69) (codons 415 to 863). Amplifications 
were done as described below. 

The PGR products contained recognition signals for 
transcription by T7 RNA polymerase and for the initiation 
of translation at thai 5' ends. PGR products were used as 
ten^lates in coupled transcription-translation reactions 
containing 40 uGi of ^^S -methionine (NEN, Dupont) . Samples 
were diluted in SDS sample buffer, and analyzed by 
electrophoresis on SDS-polyacrylamide gels containing a 
gradient of 10 to 20% acrylamide. The gels were fixed, 
treated with EnHance (Dupont) , dried and subjected to 
autoradiography . 
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RT-PCR and Direct Sequencing of PGR Products 

cDNAls were generated from RNA of lymphoblastoid or 
tumor cells with Superscript II (Life Technologies) . The 
cDNAs were then used as tenplates for PGR. The conditions 
for all amplifications were 35 cycles at 9S^C for 30s, 52«C 
to 62»C for 60 to 120s, and lO^'C for 60 to 120s, in buffer. 
The PGR products were directly sequenced and cloned into 
the T-tailed cloning vector PCR2000 (Invitrogen) and 
sequenced with T7 polymerase (United States Biochemical) . 
For the direct sequencing of PGR products, PGR reactions 
were first phenolchloroform extracted and ethanol 
precipitated. Templates were directly sequenced using 
sequi therm polymerase (Epicentre Technologies) and gamma -"P 
labelled primers as described by the manufacturer. 



Intron/Exon Boundaries and Genomic Analysis of Mutations 

Intron/exon borders were determined by cycle - 
sequencing PI clones using gamraa-"P end labelled primers 
and SequiTherm polymerase as described by the manufacturer . 
The primers used to att5)lify the hMLH2 exon containing 
codons 195 to 233 were 5' TTATTTGGGAGAAAAGCACSAG (SEQ ID No. 
70) 3' and 5' TTAAAAGACTAACCTCTTGCC 3' (SEQ ID No. 71), 
Which produced a 215 bp product. The product was cycle 
sequenced using the primer 5' CTGGTGTTATGAAGAATATGG 3' (SEQ 
ID No. 72) . The primers used to analyze the genomic 
deletion of hMIiH3 in patient GG were: for the 5' region 
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aitplification 5' CAGAAGCAGTTGCAAAGCC 3' (SEQ ID No. 73) and 
5' AAACCXSTACTCTTCACACAC 3' (SEQ ID No. 74) which produces a 
74 bp product containing codons 233 to 257, primers 5' 
GAGGAAAAGCITTTGTTGGC 3' (SEQ ID No. 75) and 5' 
CAGTGGCTGCTGACTGAC 3' (SEQ ID No, 76) which produce a 93 bp 
product containing the codons 347 to 377, and primers 5' 
TCCA<3AACCAAGAA<3GAGC 3' (SEQ ID No. 77) and 5' 
TGAGGTCTCAGCAGGC 3' (SEQ ID No. 78) which produce a 99 bp 
product containing the codons 439 to 472 of hMLH3. 
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TABLE 5 

Summary of Mutations in HMLH2 and hmIjH3 
from patients affected with HNPCC 



Genomic Predicted 
Sample Codon Nucleotides cDNA Change Change Coding 



HMIiH2 



Change 



CW 233 Skipped CAG to TAG GLN to Stop 

Exon Codon 



HMLH3 

MM, NS, 20 CGG tO CAG CGG to CAG ARG to GI*N 

TF 



GC 



GCx 



268 to 
669 

268 to 
669 



1,203 bp 
Deletion 

1,203 bp 
Deletion 



Deletion 



Deletion 



In- frame 
deletion 

Frameshif t , 
trucation 



Numerous modifications and variations of the present 
invention are possible in light of the above teachings and, 
therefore, within the scope of the appended claims, the invention 
may be practiced otherwise than as particularly described. 
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(A) TELEPHONE: 201-994-1700 

(B) TELEFAX: 201-994-1744 



(2) INFORMATION FOR SEQ ID NO:l: 



(i) 



SEQUENCE CHARACTERISTICS 

(A) LENGTH: 2525 BASE PAIRS 

(B) TYPE: NUCLEIC ACID 

{ C ) STRANDEDNESS : S INGLE 
(D) TOPOLOGY: LINEAR 



(ii) 



MOLECULE TYPE: 



CDNA 



(Xi) 



SEQUENCE DESCRIPTION: SEQ ID N0:1: 



GTTGAACATC TAGACGTTTC CrTGGCTCTT CTGGCGCCAA AATGTCX3TTC GTGGCAGGGG 
TTATTCGGCG GCTGGACGAG ACAGTGGTGA ACCGCATCGC GGCGGGGGAA GTTATCCAGC 
GGCCAGCTAA TGCTATCAAA GAC5ATGATTG AGAACTGTTT AGATGCAAAA TCCACAAGTA 
TTCAAGTGAT TGTTAAAGAG GGAGGCCTGA AGTTGATTCA GATCCAAGAC AATGGCACCG 
GGATCAGGAA AGAAGATCTG GATATTGTAT GTGAAAGTGT CACTACTAGT AAACTGCAGT 
CCTTTGAGGA TTTASCCAGT ATTTCTATCT ATGGCTTTCG AGGTGAGGCT TTGGCCAGCA 
TAAGCCATGT GGCTCATGTT ACTATTACAA CX5AAAACAGC TGATGGAAAG TGTGCATACA 
GAGCAAGTTA CTCAGATGGA AAACTGAAAG CCCCTCCTAA ACCATGTGCT GGCA ATCA AG 
GGACCCAGAT CACGGTGGAG GACCTTTTTT ACAACATAGC CACGAGGAGA AAAGCTITAA 
AAAATCCAAG TGAAGAATAT GGGAAAATTT TGGAAGTTGT TGGCAGGTAT TCAGTACACA 
ATGCAGGCAT TAGTTTCTCA GTTAAAAAAC AAGGAGAGAC AGTAGCTGAT GTTAGGACAC 
TACCCAATGC CTCAACCXTTG GACAATATTC GCTCCGTCTT GGGAAATGCT GTTAGTCGAG 
AACTGATAGA AATTGGATGT GAGGATAAAA CCCTAGCCTT CAAAATGAAT GGTTACATAT 
CCAATGCAAA CTACTCAGTG AAGAAGTGCA TCTTCTTACT CTTCATCAAC CATCGTCTGG 
TAGAATCAAC TTCCTTGAGA AAAGCCATAG AAACAGTGTA TGCAGCCTAT TTGCCAAAAA 
ACACACACCC ATTCCTGTAC CTCAGTTTAG AAATCAGTCC CCAGAATGTG GATGTTAATG 
TGAACCCCAC AAAGCATGAA GTrCACTTCC TGCACGAGGA GAGCATCCTG GAGCGGGTGC 
AGCAGCACAT CGAGAGCAAG CTCCTGGGCT CCAATTCCTC CAGGAT6TAC TTCACCCAGA 
CTTTGCTACC AGGACTTGCT GGCCCCTCTG GGGAGATGGT TAAATCCACA ACAAGTCTCA 
CCTCGTCTTC TACTTCTGGA AGTAGTGATA AGGTCTATGC CCACCAGATG GTTOGTACAG 
ATTCCCGGGA ACAGAAGCTT GATGCATTTC TGCAGCCTCT GAGCAAACCC CTGTCCAGTC 
AGCCCCAGGC CATTGTCACA GAGGATAAGA CAGATATTTC TAGTGGCAGG GCTAGGCAGC 
AAGATGAGGA GATGCTTGAA CTCCCAGCCC CTGCTGAAGT GGCTGCCAAA AATCAGAGCT 
TGGAGGGGGA TACAACAAAG GGGACTTCAG AAATGTCAGA GAAGAGAGGA CCTACTTCCA 
GCAACCCCAG AAAGAGACAT CGGGAAGATT CTGATCTCCA AATCCTCGAA GATGAT TCCC 
GAAAGGAAAT GACTGCAGCT TGTACCCCCC GGAGAAGGAT CATTAACCTC AC TAG TGTTT 
TGAGTCTCCA GGAAGAAATT AATGAGCAGG GACATGAGGT TCTCCGGGAG ATGTTGCATA 
ACCACTCCTT CGTGGGCTGT GTGAATCCTC AGTGGGCCTT GGCACAGCAT CAAACCAAGT 
TATAGCTTCT CAACACCACC AAGCTTAGTG AAGAACTGTT CTACCA GATA CTCATTTATG 
ATTTTGCCAA TTTTGGTGTT CTCAGGTTAT CGGAGCCAGC ACCGCTCTTT GACCTTGCCA 
TGCrrCCCTT ACATAGTCCA GAGAGTGGCT GGACAGAGGA AGATGGTCCC AAAG AAGG AC 
TTGCTGAATA CATTGTTGAG TTTCTGAAGA AGAAGGCTGA GATGCTTGCA GACTATTTCT 
CTTTGGAAAT TGATGAGGAA GGGAACCTGA TTGGATTACC CCTTCTGATT GACAACTATG 
TGCCCCCTTT GGAGGGACTG CCTATCTTCA TTCTTCCACT AGCCACTGAG GTGAATTGGG 
ACGAAGAAAA GGAATGTTTT GAAAGCCTCA GTAAAGAATG CGCTATGTrC TATTCCATCC 
GGAAGCAGTA CATATCTGAG GAGTCGACCC TCTCAGGCCA GCAGAGTGAA GTGCCTGGCT 
CCATTCCAAA CTCCTGGAAG TGGACTGTGG AACACATTGT CTATAAAGCC TTGCX3CTCAC 
ACATTCTGCC TCCTAAACAT TCCACAGAAG ATGGAAATAT CCTGCAGCTT GCTAACCTGC 
CTGATCTATA CAAAGTCTTT GAGAGGTGTT AAATATGGTT ATTTATGCAC TGTGGGATGT 
GTTCTTCTTT CTCTGTATrC CGATACAAAG TGTTGTACTA AAGTGTGATA TACAA AGTGT 
ACCAACATAA GTGTTGGTAG CACTTAAGAC TTATACTTGC CTTCTGATAG TATTCCTTTA 
TACACAGTGG ATTGATTATA AATAAATAGA TGTGTCTTAA CATAAAAAAA AAAAAAAAAA 
AAAAA 



60 
120 
180 
240 
300 
360 
420 
480 
54 0 
600 
660 
720 
780 
840 
900 
960 
1020 
1080 
1140 
1200 
1260 
1320 
1380 
1440 
1500 
1560 
1620 
1680 
1740 
1800 
1860 
1920 
1980 
2040 
2100 
2160 
2220 
2280 
2340 
2400 
2460 
2520 
2525 
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(2) INFORMATION FOR SEQ ID NO: 2: 

(i) SEQUENCE CHARACTERISTICS 

(A) LENGTH: 756 AMINO ACIDS 

(B) TYPE: AMINO ACID 

(C) STRANDEDNESS : 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: PROTEIN 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 2: 



Met 


Ser 


Phe 


Val 


Ala 


Glv 


Val 


He 


Ara 


Aro 


Leu Asp 


Glu 


Thr 


Val 










■J 










10 








15 


V d X 




y 


He 


Ala 


Ala 


Glv 


Glu 


Val 


He 


Gin Arg 


Pro 


Ala 


Asn 




























30 










Met 


He 


Glu 


Asn 




Leu 


Asn Als 


Lys 


Ser 


Thr 






























OCX 






vox 


xxc 


Val 

V OIX 


Xijr o 


Glu 


Glv 


Glv 
uxy 




Leu 


He 


Gin 






























Tl 0 










XIXL 


i»xy 


Tie 
xxcs 


Axg 


XI jr D 


Glu Aen 




Asp 


He 




















/ U 










va.j. 


Cys 


oXU 


Arg 




IXix 


lllx 


OCX 


Xiys 


XlCU 


G1 n 

V9XX1 OCX 




Gl 11 






























L6U 




G 

oer 


Tl o 


OCX 


iXlx 


Tyr 


v»xy 


jrne 


Axy 


Glv Gill 
V3xy oxu 


Ala 


T.011 

XlCU 


Ala 










7b 










1 An 








XUS 


oeir 


Tl a 

xxe 




XlXS 


Vox 


&1 a 
AXo 


XlXS 


Val 
Vax 




Tl o 
xxe 


XXiX XXlX 


xty o 


T>1T" 


Ala 










J.1U 










n "1 c 

XXd 








X^U 


Asp 


Gly 


Lys 


Cys 


Axa 


Tyr 


Arg 


Axa 


Ser 


Tyr 


oer Asp 


/^l 


Lys 


xieu 










X^ D 










X J U 








X O 9 


L^S 


TV 1 a 


Pro 


Pro 


Jjyo 


XrXtJ 




AXd 


Gl V 




Gl n Gl V 

UXU urXjr 


Thr* 

X XXL 


Gl n 


Tl 
X xc 










Xft Vl 










X4 3 








X3 U 


Thr 


Vdx 


m 11 

VjrX, U 


TV 


XJCIX 


It lie 


lyr 


Asn 


T 1 #» 
X xt? 


Al a 


XIXL AX^ 




XJ jr o 


Ala 








X39 










XDU 








X W 3 


Leu 


Lys 


Asn 


Pro 


Ser 


Glu 


Glu 


Tyr 


Gly 


Lys 


He Leu 


Glu 


Val 


val 










170 










175 








IBO 


Gly 


Arg 


Tyr 


Ser 


Val 


His 


Asn 


Ala 


Gly 


He 


Ser Phe 


Ser 


val 


Lys 










185 










190 








195 


Lys 


Gin 


Gly 


Glu 


Thr 


Val 


Ala 


Asp 


Val 


Arg 


Thr Leu 


Pro 


Asn 


Ala 










200 










205 








210 


Ser 


Thr 


Val 


Asp 


Asn 


He 


Arg 


Ser 


val 


Phe 


Gly Asn 


Ala 


Val 


Ser 








215 










220 








225 


Arg 


Glu 


Leu 


He 


Glu 


He 


Gly 


Cys 


Glu 


Asp 


Lys Thr 


Leu 


Ala 


Phe 










230 










235 








240 


Lys 


Met 


Asn 


Gly 


Tyr 


He 


Ser 


Asn 


Ala 


Asn 


Tyr Ser 


Val 


Lys 


Lys 










245 










250 








255 


Cys 


He 


Phe 


Leu 


Leu 


Phe 


He 


Asn 


His 


Arg 


Leu Val 


Glu 


Ser 


Thr 








260 










265 








270 


Ser 


Leu 


Arg 


Lys 


Ala 


He 


Glu 


Thr 


Val 


Tyr 


Ala Ala 


Tyr 


Leu 


Pro 






275 










280 








285 


Lys 


Asn 


Thr 


His 


Pro 


Phe 


Leu 


Tyr 


Leu 


ser 


Leu Glu 


He 


Ser 


Pro 








290 








295 








300 


Gin 


Asn 


val 


Asp 


Val 


Asn 


Val 


His 


Pro 


Thr 


Lys His 


Glu 


val 


His 
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305 



Phe 


Leu 


His 


Glu Glu 


Ser 


He 


Leu 


Glu 


ser 


Lys 


Leu Leu 


Gly 


Ser 


Asn 






"5 R 








Gin 


Thr 


Leu 


Leu Pro 

7 n 

J 3 U 


Gly 


Leu 


Ala 


Lys 


Ser 


Thr 


Thr Ser 


Leu 


Thr 


Ser 














Asp 


Lys 


val 


Tyr Ala 


His 


Gin 


Met 


Gin 


Lys 


Leu 


Asp Ala 
395 


Phe 


Leu 


Gin 


Ser 


Gin 


Pro 


Gin Ala 
410 


He 


Val 


Thr 


Ser 


Gly 


Arg 


Ala Arg 

'A A3 


Gin 


Gin 


Asp 


Ala 


Pro 


Ala 


Glu Val 


Ala 


Ala 


Lys 


Thr 


Thr 


Lys 


Gly Thr 
455 


Ser 


Glu 


Met 


Ser 


Ser 


Asn 


Pro Arg 

470 


Lys 


Arg 


His 


Met 


val 


Glu 


Asp Asp 

/toe 


Ser 


Arg 


Lys 


Pro 


Arg 


Arg 


Arg lie 

tzf\n 
o UU 


He 


Asn 


Leu 


Glu 


Glu 


lie 


Asn Glu 


Gin 


Gly 


His 


His 


Asn 


His 


Ser Phe 

b jO 


Val 


Gly 


Cys 


Ala 


Gin 


His 


Gin Thr 
545 


Lys 


Leu 


Tyr 


Ser 


Glu 


Glu 


Leu Phe 
560 


Tyr 


Gin 


He 


Phe 


Gly 


val 


Leu Arg 

c c 
575 


Leu 


Ser 


Glu 


Ala 


fie u 


Leu 


Ala Leu 


Asp 


Ser 


Pro 


Asp 


Gly 


Pro 


Lys Glu 


Gly 


Leu 


Ala 


Lys 


Lys 


Lys 


AXcL V3J.U 

on 
bz U 




Leu 


m a 

i\±ci 


Asp 


oxU 


urXU 


vjxy Asn 
635 


Leu 






Tyr 


Val 


Pro 


Pro Leu 


Glu 


Gly 


Leu 






650 








Ala 


Thr 


Glu 


Val Asn 
665 


Trp 


Asp 


Glu 


Leu 


Ser 


Lys 


Glu Cys 


Ala 


Met 


Phe 






680 








lie 


Ser 


Glu 


Glu Ser 


Thr 


Leu 


Ser 





310 










315 


oXU 


Arg 
325 


VaJ. 


oJ.n 


(jin 


Ills 


Xxc 

330 


Cat" 
OCX 


Ser 
340 


Axg 


ItIcC 


Tyr 


XrllC 


XXIX 

345 




Pro 


OCX 


oiy 


uXU 


Mot- 
ive u 


vctx 




355 








360 


Car- 


Ser 
370 


iiir 


Cov 

ocr 


^ly 


Car* 

OCJL 


Car" 

OCX 

375 


vai 


Arg 

385 


Tnr 


Asp 


ser 


Arg 


oiu 

390 


Pro 


Leu 

400 


Ser 


Lys 


Pro 


T .01 1 
J-icU 


Cot* 
Otf X 

405 




415 


Xljr O 


1. 4lX 




Tie 

X xc 


OCX. 

420 


V3XU 


wX \JL 

430 


nc k' 






uc u 


Pro 
435 




m Ti 

OXII 

445 


OCX 


XicU 


m 11 

wXU 


VarX jf 


450 


Ser 


Glu 
460 


Lys 


Arg 


Gly 


Pro 


Tnr 
465 


Arg 


Glu 
475 


Asp 


Ser 


Asp 


Val 


Glu 
480 


Glu 


Met 
490 


Thr 


Ala 


Ala 


Cys 


Thr 
495 


Tnr 


Ser 

505 


vax 


Leu 


Ser 


Leu 


oxn 
510 


LsJ-U 


vai 


Leu 


Arg 


ijXU 


Mat- 
wet. 


Leu 




520 








525 


val 


TV MM 

Asn 
535 


Pro 


Gin 


Trp 


Aia 


Leu 

540 


Leu 


Leu 
550 


Asn 


Thr 


Thr 


Lys 


Leu 

555 


Leu 


He 
565 


Tyr 


Asp 


Phe 


Ala 


Asn 
570 


Pro 


Ala 
580 


Pro 


Leu 


Phe 


Asp 


Leu 

c o c 

585 


Glu 


Ser 
595 


Gly 


Trp 


Thr 


Glu 


Glu 
600 


Glu 


Tyr 
610 


He 


val 


Glu 


Pne 


Leu 

615 


Asp 


Tyr 
625 


pne 


Ser 


Leu 


CjIU 


lie 
630 


Leu 


Pro 


Leu 


Leu 


Thr 


Asp 


Asn 












645 


Pro 


He 
655 


Phe 


He 


Leu 


Arg 


Leu 
660 


Glu 


Lys 
670 


Glu 


Cys 


Phe 


Glu 


Ser 
675 


Tyr 


Ser 
685 


He 


Arg 


Lys 


Gin 


Tyr 
690 


Gly 


Gin 


Gin 


Ser 


Glu 


val 


Pro 
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695 










700 








Giy 


Oat* 


T 1 o 






xrp 


XJ jr O 




X XIX 


Val 


Glu His 


lie 






710 










715 








Tyr 


Lys 


Ala 


Leu Arg 


Ser 


His 


He 


Leu 


Pro 


Pro 


Lys His 


Phe 




725 










730 








Glu 


Asp 


Gly 


Asn lie 


Leu 


Gin 


Leu 


Ala 


Asn 


Leu 


Pro Asp 


Leu 




74 0 










745 








Lys 


Val 


Phe 


Glu Arg 


Cys 




















755 



















705 
Val 
720 
Thr 
735 
Tyr 
750 



(2) INFORMATION FOR SEQ ID NO: 3: 

(i) SEQUENCE CHARACTERISTICS 

(A) LENGTH: 3063 BASE PAIRS 

(B) TYPE: NUCLEIC ACID 

( C ) STRANDEDNESS : S INGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: CDNA 



(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 3: 



GGCACGAGTG GCTGCITGCG GCTAGTGGAT GGTAATTGCC TGCCTCXSCGC TAGCAGCAAG 60 

CTGCTCTGTT AAAAGCGAAA ATGAAACAAT TGCCTGCGGC AACAGTTCGA CTCCTTTCAA 120 

CTTCTCAGAT CATCAOTTCG GTGGTCAGTG TTGTAAAAGA GCTTATTGAA AACTCCTTGG 180 

ATGCTGGTGC CACAAGCGTA GATGTTAAAC TGGAGAACTA TGGATTTGAT AAAATTGAGG 240 

TGCGAGATAA CXSGGGAGGGT ATCAAGGCTG TTGATGCACC TGTAATGGCA ATGAAGTACT 300 

ACACCTCAAA AATAAATAGT CATGAAGATC TTGAAAATTT GACAACTTAC GGTTTTCGTG 360 

GAGAAGCCTT GGGGTCAATT TGTTGTATAG CTGAGGTTTT AATTACAACA AGAACGGCTG 420 

CTGATAATTT TAGCACCCAG TATGTTTTAG ATGGCAGTGG CCACATACTT TCTCAGAAAC 480 

CTTCACATCT TGGTCAAGGT ACAACTGTAA CTGCTTTAAG ATTATTTAAG AATCTACCTG 54 0 

TAAGAAAGCA GTTTTACTCA ACTGCAAAAA AATGTAAAGA TGAAATAAAA AAGATCCAAG 600 

ATCTCCTCAT GAGCITTGGT ATCCTTAAAC CTGACTTAAG GATTGTCTTT GTACATAACA 660 

AGGCAGTTAT TTGGCAGAAA AGCAGAGTAT CAGATCACAA GATGGCTCTC ATGTCS^GTTC 720 

TGGGGACTGC TGTTATGAAC AATATGGAAT CCTTTCAGTA CCACTCTGAA GAATCTCAGA 780 

TTTATCTCAG TGGATTTCTT CCAAAGTGTG ATGCAGACCA CTCTTTCACT AGTCTTTCAA 840 

CACCAGAAAG AAGTTTCATC TTCATAAACA GTCGACCAGT ACATCAAAAA GATATCTTAA 900 

A6TTAATCCX3 ACATCATTAC AATCTGAAAT GCCTAAAGGA ATCTACTCGT TrGTATCCTG 960 

TrZTCTTTCr GAAAATCGAT GTTCCTACAG CTGATGTTGA TGTAAATTTA ACACCAGATA 1020 

AAAGCCAAGT ATTATTACAA AATAAGGAAT CTG'lTl'l'AAT TGCTCTTGAA AATCTGATGA 1080 

CGACTTGTTA TGGACCATTA CCTAGTACAA ATTCTTATGA AAATAATAAA ACAGATGTTT 1140 

CCGCAGCTGA CATCGTrCTT AGTAAAACAG CAGAAACAGA TGTGCTTTTT AATAAAGTGG 1200 

AATCATCTGG AAAGAATTAT TCAAATGTTG ATACTTCAGT CATTCCATTC CAAAATGATA 1260 

TGCATAATGA TGAATCTGGA AAAAACACTG ATGATTGTTT AAATCACCAG ATAAGTATTG 132 0 

GTGACTTTGG TTATGGTCAT TGTAGTAGTG AAATTTCTAA CATTGATAAA AACACTAAGA 1380 

ATGCATTTCA GGACATTTCA ATGAGTAATG TATCATGGGA GAACTCTCAG ACGGAATATA 1440 

GTAAAACTTG TTTTATAAGT TCCGTTAAGC ACACCCAGTC AGAAAATGGC AATAAAGACC 1500 

ATATAGATGA GAGTGGGGAA AATGAGGAAG AAGCAGGTCT TGAAAACTCT TCGGAAATTT 1560 

CTGCAGATGA GTGGAGCAGG GGAAATATAC TTAAAAATTC AGTGGGAGAG AATATTGAAC 1620 

CTGTGAAAAT TTTAGTGCCT GAAAAAAGTT TACCATGTAA AGTAAGTAAT AATAATTATC 1680 

CAATCCCTGA ACAAATGAAT CTTAATGAAG ATTCATGTAA CAAAAAATCA AATGTAATAG 1740 

ATAATAAATC TGGAAAAGTT ACAGCTTATG ATTTACTTAG CAATCGAGTA ATCAAGAAAC 1800 

CCATGTCAGC AAGTGCTCTT TTTGTTCAAG ATCATCGTCC TCAGTTTCTC ATAGAAAATC 1860 

CTAAGACTAG TITAGAGGAT GCAACACTAC AAATTGAAGA ACTGTGGAAG ACATTGAGTG 1920 

AAGAGGAAAA ACTGAAATAT GAAGAGAAGG CTACTAAAGA CrTGGNACGA TACAATAGTC 1980 

AAATGAAGAG AGCCATTGAA CAGGAGTCAC AAATGTCACT AAAAGATG6C AGAAAAAAGA 2040 

TAAAACCCAC CAGCGCATGG AATTTGGCCC AGAAGCACAA GTTAAAAACC TCATTATCTA 2100 

ATCAACCAKA ACTTGATGAA CTCCITCAGT CCCAAATTGA AAAAA GAAGG AGTCAAAATA 2160 

TTAAAATGGT ACAGATCCCC TTTTCTATGA AAAACTTAAA AATAAATTTT AAGAAACAAA 2220 
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ACAAAGTTGA CTTAGAAGAG AAGGATGAAC CTTGCTTGAT CCACAATCTC AGGTTTCCTG 2280 

ATGCATGGCT AATGACATCC AAAACAGAGG TAATGTTATT AAATCCATAT AGAGTAGAAG 234 0 

AAGCCCTGCT ATTTAAAAGA CTTCTTGAGA ATCATAAACT TCCTGCAGAG CCACTG GAAA 24 00 

AGCCAATTAT GTTAACAGAG AGTCTTTTTA ATGGATCTCA TTATTTAGAC GTrTTATATA 24 60 

AAATGACAC3C AGATGACCAA AGATACAGTG GATCAACTTA CCTGTCTGAT CCTCGTCTTA 2520 

CAGCGAATGG TTTCAAGATA AAATTGATAC CAGGAGTTTC AATTACTGAA AATTACTTGG 2580 

AAATAGAAGG AATGGCTAAT TGTCTCCCAT TCTATGGAGT AGCAGATTTA AAAGAAATTC 264 0 

TTAATGCTAT ATTAAACAGA AATGCAAAGG AAGTTTATGA ATGTAGACCT CGCAAAGTGA 2700 

TAAGTTATTT AGAGGGAGAA GCAGTGCGTC TATCCAGACA ATTA CCCATG TACTTATCAA 2760 

AAGAGGACAT CCAAGACATT ATCTACAGAA TOAAGCACCA GTTTGGAAAT GAAATTAAAG 2820 

AGTGTGTTCA TGGTCGCCCA TTTTTTCATC ATTTAACCTA TCTTCCAGAA ACTACMGAT 2880 

TAAATATGTT TAAGAAGATT AGTTACCATt GAAATTGGTT CTGTCATA AA ACAGCMGAG 2940 

TCTGGTTTTA AATTATCTTT GTATTATGTG TCACATGGIT ATTTTrTAAA T6AGGATTCA 3000 

CTGACTTGTT TTTATATTGA AAAAAGTTCC ACGTATTGTA GAAAACGTAA ATAAACTAAT 3060 
AAC 



(2) INFORMATION FOR SEQ ID N0:4: 

(i) SEQUENCE CHARACTERISTICS 

(A) LENGTH: 931 BASE PAIRS 

(B) TYPE: AMINO ACID 

(C) STRANDEDNESS : 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: PROTEIN (XI) 



(xi) SEQUENCE DESCRIPTION: SEQ ID N0:4: 



Met 


Lys 


Gin 


Leu 


Pro 


Ala 


Ala Thr 


Val 


Arg Leu Leu Ser 


Ser Ser 








5 








10 


15 


Gin 


He 


He 


Thr 


Ser 


val 


Val Ser 


val 


Val Lys Glu Leu 


He Glu 










20 








25 


30 


Asn 


Ser 


Leu 


Asp 


Ala 


Gly 


Ala Thr 


Ser 


val Asp val Lys 


Leu Glu 








35 








40 


45 


Asn 


Tyr 


Gly 


Phe 


Asp 


Lys 


He Glu 


Val 


Arg Asp Asn Gly Glu Gly 






50 








55 


60 


lie 


Lys 


Ala 


val 


Asp 


Ala 


Pro Val 


Met 


Ala Met Lys Tyr Tyr Thr 








65 








70 


75 


Ser 


Lys 


He 


Asn 


Ser 


His 


Gly Asp 


Leu 


Glu Asn Leu Thr Thr Tyr 








80 








85 


90 


Gly 


Phe 


Arg 


Gly 


Glu 


Ala 


Leu Gly 


Ser 


He Cys Cys He 


Ala Glu 




95 








100 


105 


Val 


Leu 


He 


Thr 


Thr 


Arg 


Thr Ala 


Ala 


Asp Asn Phe Ser 


Thr Gin 










110 






115 


120 


Tyr 


Val 


Leu 


Asp 


Gly 


Ser 


Gly His 


He 


Leu Ser Gin Lys 


Pro Ser 






125 








130 


135 


His 


Leu 


Gly 


Gin 


Gly 


Thr 


Thr Val 


Thr 


Ala Leu Arg Leu 


Phe Lys 








140 








145 


150 


Asn 


Leu 


Pro 


val 


Arg 


Lys 


Gin Phe 


Tyr 


Ser Thr Ala Lys 


Lys Cys 










155 






160 


165 


Lys 


Asp 


Glu 


He 


Lys 


Lys 


He Gin 


Asp 


Leu Leu Met Ser 


Phe Gly 






170 








175 


180 


He 


Leu 


Lys 


Pro 


Asp 


Leu 


Arg He 


Val 


Phe Val His Asn 


Lys Ala 








185 








190 


195 


Val 


He 


Trp 


Gin 


Lys 


Ser 


Arg Val 


Ser 


Asp His Lys Met 


Ala Leu 
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200 205 210 





Seir val 


Leu 


Glv 


Thr 


Ala 


Val 


Met 


Asn 


Asn 


Met 


Glu 


f>pi^ 


Phe 


















220 












Gin 


Tyr Hxs 


Ser 


Glu 


Glu 


Ser 


Gin 


He 


Tvr 

X X 


Leu 


Ser 


Glv 


Phe 


Leu 






^ J V 










4b <D 3 








V 


Pr*n 




Asn 


Ala 




His 


Ser 


Phe 


Thr 


Ser 


TiPii 

XJGU 


OCX 


Thr 

X AAX, 


C^X W 


















Z 3 U 










9 C c; 

kS33 


m 11 


AX^ OCX 


irllC 


xxc 


PVl» 

if UTS 


xxc 


A en 

AOXl 


OBX 




IrX LI 


Val 

VAX 


nxo 


ni n 


T.xre 

xiys 


















^03 










^ / u 


A fin 




JJ jr D 


Tt^ii 
ucu 


Tip 
xxc 


Atyt 


His 




lyr 


Aen 


XJCU 


Lys 




XJCU 


















0 p n 
^ o u 










O R c: 
46 03 




mill QOT" 






XJC Ll 


lyr 




Val 

V CLX 


PViP 
xrllc 




Leu 




Tip 
XXc 


2i er^ 








^ Z7 u 










^ 73 










"inn 


vctx 


It 1. C J. 


Ala 


A CT^ 


Va 1 

V CLX 


A en 


Val 
V ax 


Acn 


iJCU 


X IIX 


It X U 


et^ 
Ao^ 


T .\re. 

Xiyo 


OCX 


















3XU 










3X3 




vdx xicu 


ucu 


m n 

wXll 


A en 


T.vre 

XljfD 


cx^ 11 

wXU 


OCX 


Val 


T.oii 

ijcu 


Tl P 
XXc 


Al a 
Axel 


iJCU 


ni 11 

IJXU 


















^ 9 c 










^ 1 n 
3 J U 


AO 11 




X IIX 


X liX 


uys 


lyr 


\9X Jf 


Pv*rt 
irxu 




Ptv\ 


Gov 
OCX 


Xxlx 


2V en 
As 11 


Oat* 

oer 








"a "3 c 

J J 3 










>l A 

J4U 










1 c 

345 


lyr 




IV en 


Xiy D 


XXlx 




vaX 


C 

OCX 


Ala 
AX a 


Ala 
Axa 


Asp 


X le 


Va 1 


Leu 








"3 C /> 










*5 c c 

355 








^ ^ A 

360 




Lys Tnr 


AX a 


fill 
uXU 


mr 


Asp 


vai 


Leu 


jrne 


Asn 


Lys 


Val 


GIU 


ser 








365 










370 










375 


Ser 


Gly ' Lys 


Asn 


Tyr 


ser 


Asn 


Val 


Asp 


Thr 


Ser 


Val 


He 


Pro 


Phe 








380 










385 










390 


Gxn 


Asn Asp 


Mec 


nlS 


Asn 


Asp 


Glu 


Ser 


Gly 


Lys 


Asn 


Thr 


Asp 


Asp 








395 










400 










405 


cys 


Leu Asn 


His 


GXn 


lie 


Ser 


lie 


Gly 


Asp 


Phe 


Gly 


Tyr 


Gly 


His 








410 










415 










420 


Cys 


Ser Ser 


Glu 


He 


Ser 


Asn 


He 


Asp 


Lys 


Asn 


Thr 


Lys 


Asn 


Ala 








425 










430 








435 




Gxn Asp 


xie 


Ser 


Met 


Ser 


Asn 


ir«» 1 

Val 


Ser 


Trp 


Glu 


Asn 


Ser 


Gin 








440 










445 








450 


iniT 


Gxu Tyr 


ser 


Lys 


Tnr 


Cys 


Pile 


lie 


Ser 


Ser 


Val 


Lys 


His 


Thr 








Jl c c 










460 








465 




Ser Glu 


Asn 


vjxy 


Asn 


Lys 


Asp 


nlS 


lie 


Asp 


Glu 


Ser 


Gly 


Glu 








yi "7 A 










475 










480 


ADll 


ni 11 niii 

OXU 


f51 11 

V9XU 


Al 3 




T .oi 1 


m 11 
oxu 


21 en 


Ser 


oer 


uXU 


Tl ^ 
±Xc 


Ser 


Axa 








A n 

ft O 3 










yi o A 

490 










495 




Gl 11 TTTrt 


OCX 


A ITO 


i^xy 


A en 


Tip 

xxc 


T.All 

iJCU 


T.\^e 

xiys 


A en 
AS 11 


Ser 


Val 
Vax 


Gly 


r*i 11 


















31/3 








CI A 

OXO 


Asn 


He Glu 


P-rn 


Val 

V O Xi 


XJ jr a 


Tip 

X X.C 


XJCU 


Val 


irx Q 


rsi 11 

OXU 


Lsys 


ocr 


Leu 


rTO 








3X3 










3Z U 








3^3 


Cys 


Lys Val 


Ser 


Asn 


Asn 


Asn 


Tyr 


Pro 


He 


Pro 


Glu 


Gin 


Met 


Asn 








530 










535 










540 


Leu 


Asn Glu 


Asp 


Ser 


Cys 


Asn 


Lys 


Lys 


Ser 


Asn 


val 


He 


Asp 


Asn 








545 










550 










555 


Lys 


Ser Gly 


Lys 


Val 


Thr 


Ala 


Tyr 


Asp 


Leu 


Leu 


Ser 


Asn 


Arg 


Val 








560 










565 








570 


He 


Lys Lys 


Pro 


Met 


Ser 


Ala 


Ser 


Ala 


Leu 


Phe 


val 


Gin 


Asp 


His 








575 










580 








585 


Arg 


Pro Gin 


Phe 


Leu 


He 


Glu 


Asn 


Pro 


Lys 


Thr 


Ser 


Leu 


Glu 


Asp 
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590 



Ala 


Thr 


Leu 




xxe 


m n 

V7XU 


m 11 










£ n c 






Glu 


Lys 


Leu 


Lys 


Tyr 


oxU 


oXU 










620 






Tyr Aen 


Ser 


Gxn 


Met 


Lys 


Arg 










635 






Ser 


Leu 


Lys 


Asp 


oxy 


Arg 


iiys 










650 






Asn 


Leu 


A. J. a 


m n 


x»ys 


flXo 


Xijr o 










^ ^ c 
bob 






Pro 


Xaa 


Leu 




V3X U 


T.01 1 












C 0 

o o u 






Ser 


Gin 


TV en 




U jr O 


MPt' 


Val 
















Leu Lys 






Flits 


Ujr O 












/ X U 






Lys 


Asp 


V3XU 


Pro 


Cys 


LiSU 


XXC 
















Trp 


Leu 


raec 


inr 


Cot* 


T .xwe* 

iiys 


lllx 








740 






Arg 


Val 


Glu 


Glu 


Ala 


Leu 


Leu 








755 






Lys 


Leu 


Pro 


Ala 


Glu 


Pro 


Leu 








770 






Ser 


Leu 


Ptie 


Asn 


Gly 


Ser 


nXS 










785 






Thr 


Ala 


Asp 


Asp 


Gin 


Arg 


Tyr 










800 






Pro 


Arg 


Leu 


Thr 


Ala 


'Asn 


GXy 










815 






Val 


Ser 


He 


Thr 


Glu 


Asn 


Tyr 










830 






Cys 


Leu 


Pro 


Phe 


Tyr 


Gly 


Val 








845 






Ala 


He 


Leu 


Asn 


Arg 


Asn 


aXo 










860 






Arg 


Lys 


val 


He 


Ser 


Tyr 


Leu 










875 






Arg 


Gin 


Leu 


Pro 


Met 


Tyr 


Leu 










890 






He 


Tyr 


Arg 


Met 


Lys 


His 


Gin 










905 






Val 


His 


Gly 


Arg 


Pro 


Phe 


Phe 



920 
Thr 

















600 


Leu 


Trp 


Lys 


XZlx 


T.Al 1 


Cot* 


ox ki 


Glu 

WX U 




DXU 










V X^ 


Lys 


iixa 


lllx 


Lys 


iisp 


XltSU 




















0^ w 


Axa 


x±e 


V7XU 


r^i n 

i^in 


r»l 11 

olU 


Gat" 

oer 


VsrXIi 


1*1 






£ A A 












Lys 


xxe 


Lys 


Pro 


iiir 


Ser 


AXo 


ixp 






£ C C 










w 0 U 


Leu 


Lys 


Tnr 


Ser 


Leu 


Ser 


Asn 


m n 




b /U 












(jxn 


Ser 


oXn 


T 1 0 

xxe 


r"! 11 

oXU 






/-I J. y 






0 0 ^ 










690 


V3XXi 


Tip 


f X \J 




Ser 


Met 


Lys 


Asn 






700 










705 


m n 

wXli 


Hen 




Val 


2VST1 


lieu 


Glu 


Glu 






715 










720 


XIX D 


noil 


TiPii 

JJCU 


^*xy 




Pro 


Asp 


Ala 






730 








735 


vsxU 


vax 


weu 


Leu 


T All 






lyr 


















Pne 


Lys 


Arg 


Leu 


Leu 


^lU 


Asn 


xlxS 






/bU 










7^i*; 


Glu 


Lys 


Pro 


T 1 

lie 


Met 


Leu 


xnr 


01 11 
V7IU 




/7d 










/ D U 


Tyr 


Leu 


ASp 


vax 


Leu 


Tyr 


Lys 


Met 


















Ser 


Gly 


ser 


Tnr 


Tyr 


Leu 


Ser 


Asp 






Q A C 










0 J. w 


Phe 


Lys 


He 


Lys 


Leu 


Tl *!l 

lie 


Pro 


r^i 

triy 






a 0 n 
820 










a 0 c 


Leu 


Glu 


He 


Glu 


Gly 


Met 


Ala 


Asn 






835 










Q il A 


Ala 


Asp 


Leu 


Lys 


Glu 


He 


Leu 


Asn 




650 










0 c c 
bob 


Lys 


Glu 


Val 


Tyr 


Glu 


Cys 


Arg 


Pro 






865 










0 T 
0 /U 


Glu 


Gly 


Glu 


Ala 


vai 


Arg 


Leu 


oer 






880 










885 


Ser 


Lys 


Glu 


Asp 


lie 


Gin 


Asp 


He 






895 










900 


Phe 


Gly 


Asn 


Glu 


He 


Lys 


Glu 


Cys 






910 










915 


His 


His 


Leu 


Thr 


Tyr 


Leu 


Pro 


Glu 






925 








930 



(2) INFORMATION FOR SEQ ID NO: 5: 

(i) SEQUENCE CHARACTERISTICS 

(A) LENGTH: 2771 BASE PAIRS 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS : SINGLE 



-59- 



wo 95/20678 



PCT/DS95y01035 



(D) 



TOPOLOGY : 



LINEAR 



(ii) 



MOLECULE TYPE: 



CDNA 



(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 5: 

CX5AGGCGGAT CGGGTGTTGC ATCCATGGAG CGAGCTGAGA GCTCC5AGTAC AGAACCTGCT 
AAGGCCATCA AACCTATTGA TCGGAA6TCA GTCCATCAGA TTTGCTCrGG GCAGGTGGTA 
CTGAGTCTAA GCACTGCGGT AAAGGAGTTA GTAGAAAACA GTCTG GATG C TGGTGCCACT 
AATATTGATC TAAAGCTTAA GGACTATGGA GTGGATCTTA TTGAAGTTrC AGACAATGGA 
TCTGGGGTAG AAGAAGAAAA CTTCGAAGGC T TAACT CTGA AACATCACAC ATCTAAGATT 
CAAGAGTTTG CCGACCTAAC TCAGGTTGAA ACmTGGCT TTCX3GGGGGA AGCTCTGAGC 
TCACTTTGTG CACTGAGCGA TGTCACCATT TCTACCTGCC ACGCATCGGC GAAGGTTGGA 
ACrCGACTGA TGTTTGATCA CAATGGGAAA ATTATCCASA AAACCCCCTA CCCCCGCCCC 
AGAGGGACCA CAGTCAGCGT GCAGCAGTTA TTTTCCACAC TACCTGTGCG CCATAAGGAA 
TTTCAAAGGA ATATTAAGAA GGAGTATGCC AAAATGGTCC AGGTCTTACA TGCATACTGT 
ATCATTTCAG CAGGCATCCG TGTAAGTTGC ACCAATCAGC TTGGACAAGG AAAACGACAG 
CCTGTGGTAT GCACAGGTGG AAGCCCCAGC ATAAAGGAAA ATATCGGCTC TGTGTTTGGG 
CAGAAGCAGT TGCAAAGCCT CATTCCTTTT GTTCAGCTGC CCCCTAGTGA CTCCG TGTG T 
GAAGAGTACG GnTGAGCTG TTCGGATGCT CTGCATAATC TTmTACAT CTCAGGTTTC 
ATTTCACAAT GCACGCATGG AGTTGGAAGG AGTTCAACAG ACAGACAGTT TTTCTTTATC 
AACCGGCGGC CTTGTGACCC AGCAAAGGTC TGCAGACTCG TGAATGAGGT CTACCACATG 
TATAATCGAC ACCAGTATCC ATTrGTTGTr CITAACATTT CTGTTGATTC AGAATGCGTT 
GATATCAATG TTACTCCAGA TAAAAGGCAA ATTTTGCTAC AAGAG GARAA GCTITrGTTG 
GCAGTTTTAA AGACCTCTTT GATAGGAATG TTTGATAGTG ATGTCAACAA GCTAA ATGT C 
AGTCAGCAGC CACTGCTGGA TGTTGAASGT AACTTAATAA AAATGCATGC AGCGGATTTG 
GAAAAGCCCA TGGTAGAAAA GCAGGATCAA TCCCCITCAT TAAGGACTGG AGAAGAAAAA 
AAAGACGTGT CCATTTCCAG ACTGCX5AGAG GCCmTCTC TTOGTCACAC AACAGAGAAC 
AAGCCrCACA GCCCAAAGAC TCCAGAACCA AGAAGGAGCC CrCTAGGACA GAAAAGGGGT 
ATGCrGTCTT CTAGCACTTC AGGTGCCATC TCTGACAAAG GCGTCCTGAG ACCTCAGAAA 
GAGGCAGTGA GTTCCAGTCA CGGACCCAGT GACCCTACGG ACAGAGCGGA GGTGGAGAAG 
GACTCGGGGC ACGGCAGCAC TTCCGTGGAT TCTGAGGGGT TCAGCATCCC AGACACGGGC 
AGTCACTGCA GCAGCGAGTA TGCGGCCAGC TCCCCAGGGG ACAGGGGCTC GCAGGAACAT 
GTGGACTCTC AGGAGAAAGC GCCTGAAACT GACGACTCTT TTTCAGATGT GGACTGCCAT 
TCAAACCAGG AAGATACCX3G ATGTAAATTT CGAGTITTGC CTCAGCCAAC TAATCTCGCA 
ACCCCAAACA CAAAGCGTTT TAAAAAAGAA GAAATTCTTT CCAGTTCTGA CATTTGTCAA 
AAGTTAGTAA ATACTCAGGA CATGTCAGCC TCTCAGGTTG ATGTAGCTGT GAAAATTAAT 
AAGAAAGTTG TGCCCCTGGA CTTTTCTATG AGTTCTTTAG CTAAACGAAT AAAGCAGrTA 
CATCATGAAG CACAGCAAAG TGAAGGGGAA CAGAATTACA GGAAGTTTAG GGCAAAGATT 
TGTCCTGGAG AAAATCAAGC AGCCGAAGAT GAACTAAGAA AAGAGATAAG TAAAACGATG 
TTTGCAGAAA TGGAAATCAT TGGTCAGTTT AACCTGGGAT TTATAATAAC CACACTGAAT 
GAGGATATCT TCATA6TGGA CCAGCATGCC ACGGACGAGA AGTATAACTT CGAGATGCTG 
CAGCAGCACA CCGTGCTCCA GGGGCAGACG CTCATAGCAC CTCAGACTCT CAA CTTAA CT 
GCTGTTAATG AAGCTGTrCT GATAGAAAAT CTGGAAATAT TTAGAAAGAA TGGCTTTGAT 
TTTGTTATCG ATGAAAATGC TCCAGTCACT GAAAGGGCTA AACTGATTTC CTTGCCAACT 
AGTAAAAACT GGACCTTCGG ACCCCAGGAC GTCGATGAAC TGATCTTCAT GCTGAGCGAC 
AGCCCTGGGG TCATGTGCCG GCCTTCCCGA GTCAAGCAGA TGTTTGCCTC CAGAGCCTGC 
CGGAAGTCGG TGATGATTGG GACTGCTCTT AACACAAGCG AGATGAAGAA ACTGATCACC 
CACATGGGGG AGATGGACCA CCCCTGGAAC TGTCCCCATG GAAGGCCAAC CATGAGACAC 
ATCGCCAACC TGGGTGTCAT TTCTCAGAAC TGACCGTAGT CACTGTATGG AA TAATT GGT 
TTTATCGCAG ATTTTTATGT TTTGAAAGAC AGAGTCTTCA CTAACCmT TTGTTTTAAA 
ATGAAACCTG CTACTTAAAA AAAATACACA TCACACCCAT TTAAAAGTGA TCTTGAGAAC 
CTTTTCAAAC C 

(2) INFORMATION FOR SEQ ID NO: 6: 
(i) SEQUENCE CHARACTERISTICS 

(A) LENGTH: 862 AMINO ACIDS 

(B) TYPE: AMINO ACID 

(C) STRANDEDNESS : 



60 
120 
180 
240 
300 
360 
420 
480 
540 
600 
660 
720 
780 
640 
900 
960 
1020 
1060 
1140 
1200 
1260 
1320 
1380 
1440 
1500 
1560 
1620 
1680 
1740 
1800 
1860 
1920 
1980 
2040 
2100 
2160 
2220 
2280 
2340 
2400 
2460 
2520 
2580 
2640 
2700 
2760 
2771 
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(D) TOPOLOGY: LINEAR 
(ii) MOLECULE TYPE: PROTEIN 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 6: 



Met Glu Arg 


Ala 


Glu 


Ser 


Ser 


Ser 


Thr 


Glu 


Pro 


A J. CI 




Ala 


He 








5 










10 










15 


Lys Pro 


He 


Asp Arg 


Lys 


Ser 


Val 


His 


Gin 


He 


Cys 


Ser 


Gly 


Gin 






20 










25 










30 


val Val 


Leu 


Ser 


Leu 


Cat* 

aer 


Thr Ala 


Val 


Lys Glu 


Leu 


val 


Glu 


Asn 








35 










40 










45 


Ser Leu Asp 


Ala Gly 


m a 


Thr 


Asn 


He 


Asp 


Leu 


Lys 


Leu 


Lys 


Asp 








50 










55 










60 


Tyr Gly Val 


Asp 


Leu 


T 1 o 


Glu 


Val 


Ser 


Asp Asn 


Gly 


Cys 


Gly 


Val 








65 










70 










75 


Glu Glu 


Glu 


Asn 


Phe 


isXU 


Gly Leu Thr 


Leu Lys 


His 


His 


Thr 


Ser 








80 










85 










90 


Lys lie 


Gin 


Glu 


Phe 


J\±cL 


Asp 


Leu 


Thr 


Gin 


Val 


Glu 


Thr 


Phe 


Gly 






95 








100 










105 


Phe Arg Gly 


Glu 


Ala 


Leu 


Ser 


Ser 


Leu 


Cys Ala 


Leu Ser Asp Val 








110 










115 










120 


Thr lie 


Ser 


Thr 


Cys 


nXS 


Ala 


Ser 


Ala 


Lys 


Val 


Gly Thr Arg 


Leu 








125 










130 










135 


Met Phe Asp 


His 


Asn 


Gly 


Lys 


He 


He 


Gin Lys 


Thr 


Pro Tyr 


Pro 








140 










145 










150 


Arg Pro Arg 


Gly 


Thr 


Thr 


val 


Ser 


val 


Gin 


Gin 


Leu 


Phe 


Ser 


Thr 






155 










160 










165 


Leu Pro 


val 


Arg His 


Lys 


Glu 


Phe 


Gin 


Arg 


Asn 


He 


Lys 


Lys 


Glu 








170 








175 










180 


Tyr Ala Lys 


Met 


val 


Gin 


val 


Leu 


His 


Ala Tyr 


Cys 


He 


He 


Ser 








185 










190 










195 


Ala Gly 


He 


Arg Val 


Ser 


Cys 


Thr 


Asn 


Gin 


Leu 


Gly Gin Gly Lys 






200 










205 










210 


Arg Gin 


Leu 


Trp Tyr 


Ala 


Gin 


Val 


Glu 


Ala 


Pro 


Ala 


He 


Lys 


Glu 






215 










220 










225 


Asn lie 


Gly 


Ser 


val 


Phe 


Gly Gin 


Lys 


Gin 


Leu 


Gin 


Ser 


Leu 


He 






230 










235 










240 


Pro Phe 


val 


Gin 


Leu 


Pro 


Pro Ser Asp 


Ser 


val 


Cys 


Glu 


Glu 


Tyr 








245 










250 










255 


Gly Leu 


Ser 


Cys 


Ser 


Asp 


Ala 


Leu 


His 


Asn 


Leu 


Phe Tyr 


He 


Ser 




260 










265 










270 


Gly Phe 


He 


Ser 


Gin 


Cys 


Thr His Gly 


val 


Gly 


Arg 


Ser 


Ser 


Thr 






275 










260 










285 


Asp Arg 


Gin 


Phe 


Phe 


Phe 


He Asn Arg 


Arg 


Pro 


Cys 


Asp 


Pro 


Ala 






290 










295 










300 


Lys val 


Cys 


Arg 


Leu 


val 


Asn 


Glu 


val 


Tyr 


His 


Met 


Tyr Asn Arg 




305 










310 










315 


His Gin 


Tyr 


Pro 


Phe 


val 


val 


Leu 


Asn 


He 


Ser 


val 


Asp 


Ser 


Glu 






320 










325 










330 


Cys Val 


Asp 


He 


Asn 


Val 


Thr 


Pro Asp 


Lys Arg 


Gin 


He 


Leu 


Leu 




335 










340 










345 
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Gin Glu Glu 
Gly Met Phe 
Pro Leu Leu 
Asp Leu Glu 
Leu Arg Thr 
Arg Glu Ala 
Ser Pro Lys 
Arg Gly Met 
Gly Val Leu 
Pro Ser Asp 
His Gly Ser 
Thr Gly Ser 
Asp Arg Gly 
Glu Thr Asp 
Glu Asp Thr 
Leu Ala Thr 
Ser Ser Ser 
Ser Ala Ser 
Val Pro Leu 
Gin Leu His 
Arg Lys Phe 
Glu Asp Glu 
Met Glu lie 
Leu Asn Glu 
Lys Tyr Asn 
Gin Arg Leu 



Lys 


Leu 


Leu 




350 




Asp 


Ser 


Asp 




365 




Asp 


Val 


Glu 




380 




Lys 


Pro 


Met 




395 




Gly 


Glu 


Glu 




410 




Phe 


Ser 


Leu 




425 




Thr 


Pro 


Glu 




440 




Leu 


Ser 


Ser 




455 




Arg 


Pro 


Gin 




470 




Pro 


Thr 


Asp 




485 




Thr 


Ser 


Val 




500 




His 


Cys 


Ser 




515 




Ser 


Gin 


Glu 




530 




Asp 


Ser 


Phe 




545 




Gly 


Cys 


Lys 




560 




Pro 


Asn 


Thr 




575 




Asp 


He 


Cys 




590 




Gin 


Val 


Asp 




605 




Asp 


Phe 


Ser 




620 




His 


Glu 


Ala 




635 




Arg 


Ala 


Lys 




650 




Leu 


Arg 


Lys 




665 




He 


Gly 


Gin 




680 




Asp 


He 


Phe 




695 




Phe 


Glu 


Met 




710 




He 


Ala 


Pro 




725 





Leu Ala Val 
Val Asn Lys 
Gly Asn Leu 
Val Glu Lys 
Lys Lys Asp 
Arg His Thr 
Pro Arg Arg 
ser Thr Ser 
Lys Glu Ala 
Arg Ala Glu 
Asp Ser Glu 
Ser Glu Tyr 
His Val Asp 
Ser Asp Val 
Phe Arg Val 
Lys Arg Phe 
Pro Gin Leu 
Val Ala val 
Met Ser Ser 
Gin Gin Ser 
He Cys Pro 
Glu He ^Ser 
Phe Asn Leu 
He Val Asp 
Leu Gin Gin 
Glu Thr Leu 



Leu 


Lys 


Thr 


355 






Leu 


Asn 


Val 


370 






lie 


Lys 


Met 


o c 

385 






Gin 


Asp 


Gin 


400 






Val 


Ser 


He 


415 






Thr 


Glu 


Asn 


430 






Ser 


Pro 


Leu 


445 






Gly 


Ala 


He 


460 






Val 


Ser 


Ser 


475 






Val 


Glu 


Lys 


490 






Gly 


Phe 


Ser 


505 






Ala 


Ala 


Ser 


520 






Ser 


Gin 


Glu 


535 






Asp 


Cys 


His 


550 






Leu 


Pro 


Gin 


565 






Lys 


Lys 


Glu 


580 






Val 


Asn 


Thr 


595 






Lys 


He 


Asn 


610 






Leu 


Ala 


Lys 


625 






Glu 


Gly 


Glu 


640 






Gly 


Glu 


Asn 


655 






Lys 


Thr 


Met 


670 






Gly 


Phe 


He 


685 






Glu 


His 


Ala 


700 






His 


Thr 


Val 


715 






Asn 


Leu 


Thr 


730 







Ser 


Leu 


He 






360 


Ser 


Gin 


Gin 






1 •7 C 


His 


Ala 


Ala 








Ser 


Pro 


Ser 








Ser 


Arg 


Leu 






420 


Lys 


Pro 


His 






/I *5 C 

435 


Gly 


Gin 


Lys 






450 


Ser 


Asp 


Lys 






465 


Ser 


His 


Gly 






480 


Asp 


Ser 


Gly 






495 


He 


Pro 


Asp 






510 


Ser 


Pro 


Gly 






525 


Lys 


Ala 


Pro 






540 


ser 


Asn 


Gin 






555 


Pro 


Thr 


Asn 






570 


Glu 


He 


Leu 






585 


Gin 


Asp 


Met 






600 


Lys 


Lys 


Val 






615 


Arg 


He 


Lys 






630 


Gin 


Asn 


Tyr 






645 


Gin 


Ala 


Ala 






660 


Phe 


Ala 


Glu 






675 


He 


Thr 


Thr 






690 


Thr 


Asp 


Glu 






705 


Leu 


Gin 


Gly 






720 


Ala 


val 


Asn 






735 
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Glu Ala 


Val 


Leu 


He 


Glu 


Asn 


Leu 


Glu 


He 


Phe Arg Lys Asn Gly 








740 










745 


750 


Phe Asp 


Phe 


Val 


He 


Asp 


Glu 


Asn 


Ala 


Pro 


Val Thr Glu Arg Ala 






755 








760 


765 


Lys Leu 


He 


Ser 


Leu 


Pro 


Thr 


Ser 


Lys 


Asn 


Trp Thr Phe Gly Pro 






770 








775 


780 


Gin Asp 


val 


Asp 


Glu 


Leu 


He 


Phe 


Met 


Leu 


Ser Asp Ser Pro Gly 




785 










790 


795 


Val Met 


Cys 


Arg 


Pro 


Ser 


Arg 


Val 


Lys 


Gin 


Met Phe Ala Ser Arg 






800 










805 


810 


Ala Cys 


Arg 


Lys 


Ser 


Val 


Met 


He 


Gly 


Thr 


Ala Leu Asn Thr Ser 


815 










820 


825 


Glu Met 


Lys 


Lys 


Leu 


He 


Thr 


His 


Met 


Gly 


Glu Met Asp His Pro 






830 










835 


840 


Trp Asn 


Cys 


Pro 


His 


Gly 


Arg 


Pro 


Thr 


Met 


Arg His He Ala Asn 




845 










850 


855 


Leu Gly 


val 


He 


Ser 


Gin 


Asn 











860 

(2) INFORMATION FOR SEQ ID NO: 7: 

(i) SEQUENCE CHARACTERISTICS 

(A) LENGTH: 20 BASE PAIRS 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS : SINGLE 

(D) TOPOLOGY; LINEAR 

(ii) MOLECULE TYPE: Oligonucleotide 
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(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 7: 
GTTGAACATC TAGACGTCTC 
(2) INFORMATION FOR SEQ ID NO: 8: 

(i) SEQUENCE CHARACTERISTICS 

(A) LENGTH: 19 BASE FAIRS 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS : SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: Oligonucleotide 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 8: 

TCGTGGCAGG GGTTATTCG 
(2) INFORMATION FOR SEQ ID NO: 9: 

(i) SEQUENCE CHARACTERISTICS 

(A) LENGTH: 19 BASE PAIRS 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: Oligonucleotide 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 9: 

CTACCCAATG CCTCAACCG 

(2) INFORMATION FOR SEQ ID NO: 10: 

(i) SEQUENCE CHARACTERISTICS 

(A) LENGTH: 22 BASE PAIRS 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: Oligonucleotide 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 10: 

GAGAACTGAT AGAAATTGGA TG 
(2) INFORMATION FOR SEQ ID NO: 11: 

(i) SEQUENCE CHARACTERISTICS 

(A) LENGTH: 18 BASE PAIRS 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 
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(D) TOPOLOGY: LINEAR 
(ii) MOLECULE TYPE: Oligonucleotide 
(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 11 
GGGAC&TGUIG GITCTCGG 

(2) INFORMATION FOR SEQ ID NO: 12: 

(i) SEQUENCE CHARACTERISTICS 

(A) LENGTH: 19 BASE PAIRS 

(B) TYPE: NUCLEIC ACID 

( C ) STRANDEDNES S : S INGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: Oligonucleotide 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 12 

GGGCTGTGTG AATCCTCAG 

(2) INFORMATION FOR SEQ ID NO: 13: 

(i) SEQUENCE CHARACTERISTICS 

(A) LENGTH: 20 BASE PAIRS 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS : SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: Oligonucleotide 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 13 

CGGTTCACCA CTGTCTCGTC 
(2) INFORMATION FOR SEQ ID NO: 14: 

(i) SEQUENCE CHARACTERISTICS 

(A) LENGTH: 18 BASE PAIRS 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINE7VR 

(ii) MOLECULE TYPE: Oligonucleotide 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 14 

TCCAGGATGC TCTCCTCG 

(2) INFORMATION FOR SEQ ID NO: 15: 
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(i) SEQUENCE CHARACTBRISTICS 

(A) LENGTH: 20 BASE PAIRS 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS : SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: Oligonucleotide 
(xi) SEQUENCE DESCRIPTION; SEQ ID NO: 15: 

CAAGTCCTGG TAGCAAAGTC 

(2) INFORMATION FOR SEQ ID NO: 16: 

(i) SEQUENCE CHARACTERISTICS 

(A) LENGTH: 19 BASE PAIRS 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: Oligonucleotide 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 16: 

ATGGCAAGGT CAAAGAGCG 
(2) INFORMATION FOR SEQ ID NO: 17: 

(i) SEQUENCE CHARACTERISTICS 

(A) LENGTH: 22 BASE PAIRS 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: Oligonucleotide 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 17: 

CAACAATGTA TTCAGNAAGT CC 
(2) INFORMATION FOR SEQ ID NO: 18: 

(i) SEQUENCE CHARACTERISTICS 

(A) LENGTH: 21 BASE PAIRS 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: Oligonucleotide 
(xi) SEQUENCE DESC31IPTI0N: SEQ ID NO: 18: 
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TTGATACAAC ACTTTGTATC G 
(2) INFORMATION FOR SEQ ID NO: 19: 

(i) SEQUENCE CHARACTERISTICS 

(A) LENGTH: 21 BASE PAIRS 

(B) TYPEi NUCLEIC ACID 

(C) STRANDEDNBSS : SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: Oligonucleotide 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 19 

GG7UVTACTAT CAGAAGGCAA G 
(2) INFORMATION FOR SEQ ID NO: 20: 

(i) SEQUENCE CHARACTERISTICS 

(A) LENGTH: 21 BASE PAIRS 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNBSS: SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: Oligonucleotide 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 20 

ACAGAGCAAG TTACTCAGAT G 
(2) INFORMATION FOR SEQ ID NO: 21: 

(i) SEQUENCE CHARACTERISTICS 

(A) LENGTH: 2 0 BASE PAIRS 

(B) TYPE: NUCLEIC ACID 

( C } STRANDEDNBSS : S INGLE 
(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: Oligonucleotide 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 21 

GTACACAATG CAGGCATTAG 
(2) INFORMATION FOR SEQ ID NO: 22: 

(i) SEQUENCE CHARACTERISTICS 

(A) LENGTH: 21 BASE PAIRS 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNBSS: SINGLE 

(D) TOPOLOGY: LINEAR 
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(ii) MOLECULE TYPE: Oligonucleotide 
(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 22 

AATGTGGATG TTAATGTGCA C 

(2) INFORMA.TION FOR SEQ ID NO: 23: 

(i) SEQUENCE CHARACTERISTICS 

(A) LENGTH: 19 BASE PAIRS 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS : SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: Oligonucleotide 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 23 

CTGACCTCGT CTTCCTAC 

(2) INFORMATION FOR SEQ ID NO: 24: 

(i) SEQUENCE CHARACTERISTICS 

(A) LENGTH: 19 BASE PAIRS 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 
^ (D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: Oligonucleotide 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 24 
CAGCAAGATG AGGA(3ATGC 

(2) INFORMATION FOR SEQ ID NO: 25: 

(i) SEQUENCE CHARACTERISTICS 

(A) LENGTH: 21 BASE PAIRS 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: Oligonucleotide 
(xi) SEQUENCE DES(3[IIPTI0N : SEQ ID NO: 25 

GGAAATGGTG GAAGATGATT C 

(2) INFORMATION FOR SEQ ID NO: 26: 

(i) SEQUENCE CHARACTERISTICS 

(A) LENGTH: 16BASE PAIRS 
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(B) TYPE: NUCLEIC ACID 

(C) STRANDEiniESS : SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: Oligonucleotide 
(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 26: 
CTTCTCAACA CCAAGC 

(2) INFORMATION FOR SEQ ID NO: 27: 

(i) SEQUENCE CHARACTERISTICS 

(A) LENGTH: 21 BASE PAIRS 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS : SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: Oligonucleotide 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 27: 

GAAATTGATG AGGAAGGGAA C 
(2) INFORMATION FOR SEQ ID NO: 28: 

(i) SEQUENCE CHARACTERISTICS 

(A) LENGTH: 22 BASE PAIRS 

(B) TYPE: NUCLEIC ACID 

( C ) STRANDEDNESS : S INGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: Oligonucleotide 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 28: 

CTTCTGATTG ACAACTATGT GC 

(2) INFORMATION FOR SEQ ID NO: 29: 

(i) SEQUENCE CHARACTERISTICS 

(A) LENGTH: 22 BASE PAIRS 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: Oligonucleotide 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 29: 

CACAGAAGAT GGAAATATCC TG 
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(2) INFORMATION FOR SEQ ID NO: 30: 

(i) SEQUENCE CHARACTERISTICS 

(A) LENGTH: 20 BASE PAIRS 

(B) TYPE: NUCLEIC ACID 

( C ) STRANDEDNESS : S INGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE; Oligonucleotide 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 30: 

GTGTTGGTAG CACTTAAGAC 

(2) INFORMATION FOR SEQ ID N0:31: 

(i) SEQUENCE CHARACTERISTICS ^ 

(A) LENGTH: 20 BASE PAIRS 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: Oligonucleotide 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 31: 

TTTCCCATAT TCTTCACTTG 

(2) INFORMATION FOR SEQ ID NO: 32: 

(i) SEQUENCE CHARACTERISTICS 

(A) LENGTH: 19 BASE PAIRS 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: Oligonucleotide 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 32: 

GTAACATGAG CCACATGGC 

(2) INFORMATION FOR SEQ ID NO: 33: 

(i) SEQUENCE CHARACTERISTICS 

(A) LENGTH: 19 BASE PAIRS 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

( D ) TOPOLOGY : LINEAR 

(ii) MOLECULE TYPE: Oligonucleotide 
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(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 33 
CCACTGTCTC GTCCAGCCG 

(2) INFORMATION FOR SEQ ID NO: 34: 

(i) SEQUENCE CHARACTERISTICS 

(A) LENGTH: 26 BASE PAIRS 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS : SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: Oligonucleotide 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 34 

CGGGATCCAT GTCGTTCGTG GCAGGG 

(2) INFORMATION FOR SEQ ID NO: 35: 

(i) SEQUENCE CHARACTERISTICS 

(A) LENGTH: 26 BASE PAIRS 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: Oligonucleotide 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 35 

GCTCTAGATT AACACCTCTC AAAGAC 
(2) INFORMATION FOR SEQ ID NO: 36: 

(i) SEQUENCE CHARACTERISTICS 

(A) LENGTH: 21 BASE PAIRS 

(B) TYPE: NUCLEIC ACID 

{ C ) STRANDEDNES S : S INGLE 
(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: Oligonucleotide 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 36 

GCATCTAGAC GTTTCCTTGG C 
(2) INFORMATION FOR SEQ ID NO: 37: 

(i) SEQUENCE CHARACTERISTICS 

(A) LENGTH: 20 BASE PAIRS 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 
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(B) TOPOLOGY: LINEAR 
(ii) MOLECULE TYPE: Oligonucleotide 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 37: 

CATCCAAGCT TCTGTTCCCG 

(2) INFORMATION FOR SEQ ID NO: 38: 

(i) SEQUENCE CHARACTERISTICS 

(A) LENGTH: 19 BASE PAIRS 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS : SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: Oligonucleotide 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 38: 

GGGGTGCAGC AGCACATCG 

(2) INFORMATION FOR SEQ ID NO: 39: 

(i) SEQUENCE CHARACTERISTICS 

(A) LENGTH: 20 BASE PAIRS 

(B) TYPE: NUCLEIC ACID 

( C ) STRANDEDNESS : S INGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: Oligonucleotide 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 39: 

GGAGGCAGAA TGTGTGAGCG 

(2) INFORMATION FOR SEQ ID NO: 40: 

(i) SEQUENCE CHARACTERISTICS 

(A) LENGTH: 19 BASE PAIRS 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: Oligonucleotide 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO:40: 

TCCCAAAGAA GGACTTGCT 

(2) INFORMATION FOR SEQ ID N0:41: 
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(i) SEQUENCE C3IARACTERISTICS 

(A) LENGTH: 22 BASE PAIRS 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS : SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: Oligonucleotide 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 41: 

AGTATAAGTC TTAAGTGCTA CC 
(2) INFORMATION FOR SEQ ID NO: 42: 

(i) SEQUENCE CHARACTERISTICS 

(A) LENGTH: 20 BASE PAIRS 

(B) TYPE: NUCLEIC ACID 

( C } STRANDEDNES S : S INGLE 
(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: Oligonucleotide 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 41: 

TTTATGGTTT CTCACCTGCC 
(2) INFORMATION FOR SEQ ID NO: 43: 

(i) SEQUENCE CHARACTERISTICS 

(A) LENGTH: 19 BASE PAIRS 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: Oligonucleotide 
(xi) SEQUENCE DESCRIPTION: SEQ ID N0:43: 

GTTATCTGCC CACCTCAGC 

(2) INFORMATION FOR SEQ ID NO: 44: 

(i) SEQUENCE CHARACTERISTICS 

(A) LENGTH: 59 BASE PAIRS 

(B) TYPE: nu<::leic acid 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: Oligonucleotide 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 44: 
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GGATCCTAAT AC6ACTCACT ATAG6GAGAC CUCCATGGCA TCTAGACGTT TCCCTTGGC 
(2) INFORMATION FOR SEQ ID NO: 45: 

(i) SEQUENCE CHARACTERISTICS 

(A) LENGTH: 20 BASE PAIRS 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS : SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: Oligonucleotide 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 45: 
CATCCAAGCT TCTGTTCCCG 
(2) INFORMATION FOR SEQ ID NO: 46: 

(i) SEQUENCE CHARACTERISTICS 

(A) LENGTH: 56 BASE PAIRS 

(B) TYPE: NUCLEIC ACID 

{ C ) STRANDEDNESS : S INGLE 
(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: Oligonucleotide 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 46: 

GGATCCTAAT ACGACTCACT ATAGGGAGAC CACCATGGGG GTGCAGCAGC ACATCG 
(2) INFORMATION FOR SEQ ID NO: 47: 

(i) SEQUENCE CHARACTERISTICS 

(A) LENGTH: 20 BASE PAIRS 

(B) TYPE: NUCLEIC ACID 

( C ) STRANDEDNESS : S INGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: Oligonucleotide 
(xi) SEQUENCE DESCRIPTION: SEQ ID N0:47: 

GGAGGCAGAA TGTGTGAGCG 
(2) INFORMATION FOR SEQ ID NO: 48: 

(i) SEQUENCE CHARACTERISTICS 

(A) LENGTH: 28 BASE PAIRS 

(B) TYPE: NUCLEIC ACID 

( C } STRANDEDNESS : S INGLE 
(D) TOPOLOGY: LINEAR 
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(ii) MOIiECOliE TYPE: Oligonucleotide 
(xi) SEQUENCE DESCRIPTION: SEQ ID N0:48: 
CGGGATCCAT GAAACAATTG CCTGCGGC 
(2) INFORMATION FOR SEQ ID NO: 49: 

(i) SEQUENCE CHARACTERISTICS 

(A) LENGTH: 26 BASE PAIRS 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS : SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: Oligonucleotide 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 49: 

GCTCTAGACC AGACTCATGC T(3TTTT 
(2) INFORMATION FOR SEQ ID NO: 50: 

(i) SEQUENCE CHARAtTTERISTICS 

(A) LENGTH: 26 BASE PAIRS 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: Oligonucleotide 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 50: 
CGGGATCCAT GGAGC6AGCT GAGAGC 
(2) INFORMATION FOR SEQ ID NO: 51: 

(i) SEQUENC:E CHARACTERISTICS 

(A) LENGTH: 23 BASE PAIRS 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: Oligonucleotide 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 51: 

GCTCTAGAGT GAAGACTCTG TCT 

(2) INFORMATION FOR SEQ ID NO: 52: 

(i) SEQUENCE CHARACTERISTICS 

(A) LENGTH: 20 BASE PAIRS 
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(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS : SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: Oligonucleotide 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 52: 

AAGCTGCTCT GTTAAAAGCG 

(2) INFORMATION FOR SEQ ID NO: 53: 

(i) SEQUENCE CHARACTERISTICS 

(A) LENGTH: 18 BASE PAIRS 

(B) TYPE: NUCLEIC ACID 

( C ) STRANDEDNESS : S INGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: Oligonucleotide 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 53: 

GCACCAGCAT CCAAGGAG 

(2) INFORMATION FOR SEQ ID NO: 54: 

(i) SEQUENCE CHARACTERISTICS 

(A) LENGTH: 19 BASE PAIRS 

(B) TYPE: NUCLEIC ACID 

( C ) STRANDEDNES S : S INGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: Oligonucleotide 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 54: 

(3AACCATGAG ACACATCGC 

(2) INFORMATION FOR SEQ ID NO: 55: 

(i) SEQUENCE CHARACTERISTICS 

(A) LENGTH: 20 BASE PAIRS 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: Oligonucleotide 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 55: 

AC^GTTAGTGA AC3ACTCTGTC 
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(2) INFORMATION FOR SEQ ID NO: 56: 

(i) SEQUENCE CniARACTERISTICS 

(A) LENGTH: 53 BASE PAIRS 

(B) TYPE: NUCLEIC ACID 

( C ) STRANDEDNESS : S INGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: Oligonucleotide 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 56: 

GGATCCTAAT ACGACTCACT ATAGGGAGAC CACCATGGAA CAATTGCCTG CGG 
(2) INFORMATION FOR SEQ ID NO: 57: 

(i) SEQUENCE CHARACTERISTICS 

(A) LENGTH: 18 BASE PAIRS 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: Oligonucleotide 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 57: 

CCT6CTCCAC TCATCTGC 

(2) INFORMATION FOR SEQ ID NO: 58: 

(i) SEQUENCE CHARACTERISTICS 

(A) LENGTH: 60 BASE PAIRS 

(B) TYPE: NUCLEIC ACID 

( C ) STRANDEDNESS : S INGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: Oligonucleotide 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:58: 
GGATCCTIAT ACGACTCACT ATAGGGAGAC CACCATGGAA GATATCTTAA AGTTAATCCG 
(2) INFORMATION FOR SEQ ID NO: 59: 

(i) SEQUENCE CHARACTERISTICS 

(A) LENGTH: 21 BASE PAIRS 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: Oligonucleotide 
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(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 59: 
GGCTTCTTCT ACTCTATATG G 
(2) INFORMATION FOR SEQ ID NO: 60: 

(i) SEQUENCE CHARACTERISTICS 

(A) LENGTH: 58 BASE PAIRS 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS : SINGLE 

(D) TOPOLOGY: LINEAR 

(11) MOLECULE TYPE: Oligonucleotide 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 60: 

GGATCCTAAT ACGACTCACT ATAGGGAGAC CACCATGGCA GGTCTTGAAA ACTCTTCG 

(2) INFORMATION FOR SEQ ID NO: 61: 

(I) SEQUENCE CHARACTERISTICS 

(A) LENGTH: 21 BASE PAIRS 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 

(il) MOLECULE TYPE: Oligonucleotide 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 61: 

AAAACAAGTC AGTC3AATCCT C 

(2) INFORMATION FOR SEQ ID NO: 62: 

(i) SEQUENCE CHARACTERISTICS 

(A) LENGTH: 2 0 BASE PAIRS 

(B) TYPE: NUCLEIC ACID 

( C ) STRANDEDNESS : S INGLE 

(D) TOPOLOGY: LINEAR 

(II) MOLECULE TYPE: Oligonucleotide 
(xi) SEQX7ENCE DESCRIPTION: SEQ ID NO: 62: 

AAGCACATCT (STTTCTGCTG 

(2) INFORMATION FOR SEQ ID NO: 63: 

(1) SEQUENCE CHARACTERISTICS 

(A) LENGTH: 20 BASE PAIRS 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 
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(D) TOPOLOGY: LINEAR 
(ii) MOLECQLE TYPE: Oligonucleotide 
(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 63: 

ACGAGTAGAT TCCTTTAGGC 

(2) INFORMATION FOR SEQ ID NO: 64: 

(i) SEQUENCE CHARACTERISTICS 

(A) LENGTH: 19 BASE PAIRS 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS : SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: Oligonucleotide 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 64: 

CAGAACTGAC ATGAGAGCC 
19 

(2) INFORMATION FOR SEQ ID NO: 65: 

(i) SEQUENCE CHARACTERISTICS 

(A) LBNGTO: 52 BASE PAIRS 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: Oligonucleotide 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 65: 

GGATCCTAAT ACGACTCACT ATAGGGAGAC CACCATGGAG CGAGCFGAGA GC 
(2) INFORMATION FOR SEQ ID NO: 66: 

(i) SEQUENCE CHARACTERISTICS 

(A) LENGTH: 20 BASE PAIRS 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: Oligonucleotide 
(xi) SEQUENCE DESCSIIPTION: SEQ ID NO: 66: 

AGGTTAGTGA AGACTCTGTC 
(2) INFORMATION FOR SEQ ID NO: 67: 



-79- 



wo 95/20678 



PCTAJS95/01035 



(i) SEQUENCE CHARACTERISTICS 

(A) LENGTH: 17 BASE PAIRS 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS : SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: Oligonucleotide 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 67: 

CTGAGGTCTC AGCAGGC 17 
(2) INFORMATION FOR SEQ ID NO: 68: 

(i) SEQUENCE CHARACTERISTICS 

(A) LENGTH: 57 BASE PAIRS 

(B) TYPE: NUCLEIC ACID 

( C ) STRANDEDNESS : S INGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: Oligonucleotide 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 68: 

6GATCCTAAT AOGACTCACT ATAGGGAGAC CACCATGGTG TCCATTTCCA GACTGCG 57 
(2) INFORMATION FOR SEQ ID NO: 69: 

(i) SEQUENCE CHTUUICTERISTICS 

(A) LENGTH: 2 0 BASE PAIRS 

(B) TYPE: NUCLEIC ACID 

{ C ) STRANDEDNESS : S INGLE 
(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: Oligonucleotide 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 69: 

AGGTTAGTGA AGACTCTGTC 20 
(2) INFORMATION FOR SEQ ID NO: 70: 

(i) SEQUENCE CHARACTERISTICS 

(A) LENGTH: 21 BASE PAIRS 

(B) TYPE: NUCnJEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: Oligonucleotide 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 70: 
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TTATTTGGCA GAAAAGCAGA 6 
(2) INFORMATION FOR SEQ ID NO: 71: 

(i) SEQUENCE CHARACTERISTICS 

(A) LENGTH: 21 BASE PAIRS 

(B) TYPE: NUCLEIC ACID 

( C ) STRANDEDNESS : S INGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: Oligonucleotide 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 71 

TTAAAAGACT AACOTCTTGC C 
(2) INFORMATION FOR SEQ ID NO: 72: 

(i) SEQUENCE CHARACTERISTICS 

(A) LENGTH: 21 BASE PAIRS 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: Oligonucleotide 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 72 
CTGCTGTTAT GAACAATAT6 6 
(2) INFORMATION FOR SEQ ID NO: 73: 

(i) SEQUENCE CHARACTERISTICS 

(A) LENGTH: 19 BASE PAIRS 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: Oligonucleotide 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 73 

CAGAAGCAGT TGCAAAGCC 

(2) INFORMATION FOR SEQ ID NO: 74: 

(i) SEQUENCE CHARACTERISTICS 

(A) LENGTH: 20 BASE PAIRS 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 



-81- 



wo 95/20678 



PCT/US95/0103S 



(ii) MOliBCOLE TYPE: Oligonucleotide 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 74: 

AAACCGTACT CTTCACACAC 

(2) INFORMATION FOR SEQ ID NO: 75: 

(i) SEQUENCE CHARACTERISTICS 

(A) LENGTH: 20 BASE PAIRS 

(B) TYPE: NUCIiEIC ACID 

(C) STRANDEDNESS : SINGIiE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: Oligonucleotide 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 75: 

GAGGAAAAGC TTTTGTTGGC 

(2) INFORMATION FOR SEQ ID NO: 76: 

(i) SEQUENCE CHARACTERISTICS 

(A) LENGTH: 18 BASE PAIRS 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: Oligonucleotide 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 76: 

CAGTGGCTGC TGACTGAC 

(2) INFORMATION FOR SEQ ID NO: 77: 

(i) SEQUENCE CHARACTERISTICS 

(A) LENGTH: 19 BASE PAIRS 

(B) TYPE: NUCLEIC ACID 

(C) STRANDEDNESS: SINGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: Oligonucleotide 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 77: 

TCCAGAACCA AGAAGGAGC 

(2) INFORMATION FOR SEQ ID NO: 78: 

(i) SEQUENCE CHARACTERISTICS 

(A) LENGTH: 16 BASE PAIRS 
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(B) TYPE: NUCLEIC ACID 

( C ) STRANDEDNESS : S INGLE 

(D) TOPOLOGY: LINEAR 

(ii) MOLECULE TYPE: Oligonucleotide 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 78: 
TGAGGTCrCA GCAGGC 
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WHAT IS CLAIMED IS : 

1. An isolated polynucleotide selected from the group 
consisting of: 

(a) a polynucleotide encoding a polypeptide having the 
deduced amino acid sequence of SEQ ID No. 2 or a fragment, analog 
or derivative of said polypeptide; 

(b) a polynucleotide encoding a polypeptide having the 
amino acid sequence encoded by the cDNA contained in ATCC Deposit 
No. 75649; 

(c) a polynucleotide encoding a polypeptide having the 
deduced amino acid sequence of SEQ ID No. 4 or a fragment, analog 
or derivative of said polypeptide; 

(d) a polynucleotide encoding a polypeptide having the 
amino acid sequence encoded by the cDNA contained in ATCC Deposit 
No. 75651; 

(e) a polynucleotide encoding a polypeptide having the 
deduced amino acid sequence of SEQ ID No. 6 or a fragment, analog 
or derivative of said polypeptide; and 

(f ) a polynucleotide encoding a polypeptide having the 
amino acid sequence encoded by the cDNA contained in ATCC Deposit 
No, 75650. 

2 . The polynucleotide of Claim 1 wherein the 
polynucleotide is DNA. 

3. The polynucleotide of Claim l wherein the 
polynucleotide is RNA. 

4. The polynucleotide of Claim 1 wherein the 
polynucleotide is genomic DNA, 

5. The polynucleotide sequence of claim 1 for use in 
analyzing a san^le for mutation of a polynucleotide sequence 
encoding a human mismatch repair protein comprising: 
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a polynucleotide sequence of at least 15 and no more 
than 30 consecutive bases of the polynucleotide sequence of ATTC 
Deposit No. 75649, 

6. The polynucleotide sequence of claim i for use in 
analyzing a san^le for mutation of a polynucleotide sequence 
encoding a human mismatch repair protein comprising: 

a polynucleotide sequence of at least 15 and no more 
than 30 consecutive bases of the the polynucleotide sequence of 
ATTC Deposit No. 75651. 

7. The polynucleotide sequence of claim 1 for use in 
analyzing a sanple for mutation of a polynucleotide sequence 
encoding a human mismatch repair protein comprising: 

a polynucleotide sequence of at least 15 and no more 
than 30 consecutive bases of the the polynucleotide sequence of 

ATTC Deposit No. 75650. 

8. The polynucleotide of Claim 2 wherein said 
polynucleotide encodes a polypeptide having the deduced amino 
acid sequence of SEQ ID No . 2 . 

9. The polynucleotide of Claim 2 wherein said 
polynucleotide encodes a polypeptide having the deduced amino 
acid sequence of SEQ ID No. 4. 

10 . The polynucleotide of Claim 2 wherein said 
polynucleotide encodes a polypeptide having the deduced amino 
acid sequence of SEQ ID No. 6. 

11. The polynucleotide of Claim 2 wherein said 
polynucleotide encodes a polypeptide encoded by the cDNA of ATCC 
Deposit No. 75649. 
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12 . The polynucleotide of Claim 2 wherein said 
polynucleotide encodes a polypeptide encoded by the cDNA of ATCC 
Deposit No. 75651. 

13 . The polynucleotide of Claim 2 wherein said 
polynucleotide encodes a polypeptide encoded by the cDNA of ATCC 
Deposit No. 75650. 

14 . The polynucleotide of Claim 1 having the coding 
sequence of SEQ ID No. 1. 

15. The polynucleotide of Claim 1 having the coding 
sequence of SEQ ID No. 3. 

16 . The polynucleotide of Claim i having the coding 
sequence of SEQ ID No. 5) . 

17. A vector containing the DNA of Claim 2. 

18 . A host cell genetically engineered with the vector of 
Claim 17, 

19- A process for producing a polypeptide comprising: 

expressing from the host cell of Claim 18 the polypeptide encoded 
by said DNA. 

20. A process for producing cells capable of e3q>ressing a 
polypeptide comprising genetically engineering cells with the 
vector of Claim 17. 

21. An isolated DNA hybridizod>le to the DNA of Claim 2 and 
encoding a polypeptide having hMLHl activity. 

22 . An isolated DNA hybridizable to the DNA of Claim 2 and 
encoding a polypeptide having hMljH2 activity. 
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23. An isolated DNA hybridizable to the DNA of Claim 2 and 
encoding a polypeptide having bMLHB activity. 

24. A polypeptide selected from the group consisting of: 

(a) a polypeptide having the deduced amino acid 
sequence of SEQ ID No. 2 and fragments, analogs and derivatives 
thereof ; 

(b) a polypeptide encoded by the cDNA of ATCC Deposit 
No. .75649 and fragments, analogs and derivatives of said 
polypeptide; 

(c) a polypeptide having the deduced amino acid 
sequence of SEQ ID No, 4 and fragments, analogs and derivatives 
thereof ; 

(d) a polypeptide encoded by the cDNA of ATCC Deposit 
No. 75651 and fragments, analogs and derivatives of said 
polypeptide ; 

(e) a polypeptide having the deduced amino acid 
sequence of SEQ ID No. 6 and fragments, analogs and derivatives 
thereof ; and 

(f ) a polypeptide encoded by the cDNA of ATCC Deposit 
No. 75650 and fragments, analogs and derivatives of said 
polypeptide. 

25. The polypeptide of Claim 15 wherein the polypeptide is 
hMLHl having the deduced amino acid sequence of SEQ ID No. 2. 

26. The polypeptide of Claim 14 wherein the polypeptide is 
hMljH2 having the deduced amino acid sequence of SEQ ID No. 4. 

27. The polypeptide of Claim 14 wherein the polypeptide is 
hMLH3 having the deduced amino acid sequence of SEQ ID No. 6. 

28. A process for diagnosing a susceptibility to cancer 
coti^rising: 
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detiermining from a sample derived from a human patient 
a mutation in a hixman mismatch repair gene, said human mismatch 
repair gene con^rising the polynucleotide sequence of claim 8 . 

29. A process for diagnosing a susceptibility to cancer 
comprising : 

determining from a sample derived from a human patient 
a mutation in a human mismatch repair gene, said human mismatch 
repair gene comprising the DNA of claim 9. 

30. A process for diagnosing a susceptibility to cancer 
comprising: 

determining from a san^jle derived from a human patient 
a mutation in a human mismatch repair gene, said human mismatch 
repair gene cott^rising the DNA of claim 10. 

31. A process for diagnosing a susceptibility to cancer 
comprising: 

determining from a sample derived from a human patient 
a mutation in a human DNA mismatch repair gene which encodes the 
human homolog of a bacterial mutL DNA mismatch repair gene. 
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Intanatioiuil a|^»lication No. 
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A. CLASSmCATION OP SUBIBCT MATTER: 
USCL : 



435/6, 192.1, 193.1; 530/300, 350, 358, 387.3. 388.21; 536/23.1, 23.4, 24.31 

B. FIELDS SEARCHED 
Minimum documentation searched 
Classification System: U.S. 

435/6, 192.1. 193.1; 530/300. 350, 358. 387.3. 388.21; 536/23.1, 23.4. 24.31 
B. FIELDS SEARCHED 

Electronic data bases consulted (Name of data base and where pnctieable tenns used): 

BIOSIS. MEDLINE. EMBASE, CAPLUS, HCA. USPATFULL, WPIDS, CANCERLIT, GENBANK. GENBANK. 
GENBANK-NEW, UEMBL (searched on scq IDs from related US case, US08187757, CRF disk was defective)) 
Search terms: human DNA repair (genes or proteins), mutator genes. mulL, hMLHl, hMLH2. hMLH3 , colon cancer, 
mtcrosateilite instability, Haseltine, Prolla, Liskay 

BOX U. OBSERVATIONS WHERE UNITY OF INVENTION WAS LACKING 
This ISA found multiple inventions as foUows: 

I. Claims 1-23, dnwn to polynucleotides encoding polypqitides having the deduced amino acid sequences of 

hMLH-encoded proteins, their analogs or derivatives, vectors containing said polynucleotides, host cells 
genetically engineered with said vectors, process of growing said host cells. 

n. Claims 24-27. drawn to polypeptides and methods of polypeptide production from host cells expressing hMLH 
genes. 

in. Claims 28-31, drawn to a process for diagnosing cancer suscqjtibility comprising identifying mutations in 
hMLHl, hMLH2, hMLH3 and the human homolog of bacterial mutL. 

An Election of Species for Groups I. II, and III is required wherein: 

species A is drawn to hMLHl 

species B is drawn to hMLH2 

species C is drawn to hMLH3 

and wherein Group III has an additional species: 

species D, drawn to the human homolog of bacterial mutL. 



These groups are separate and distinct from each other. Group I is drawn to products which are polynucleotides while 
Group II is drawn to producU which are polypeptides and to a process of making said polypeptides. The products of 
Groups I and II have different structural and biochemical properties and may be used in distinctly dilTeiciit processes. 
Polynucleotides may be used as probes in linkage analyses, and DNA-based genetic therapy while polypeptides may be 
used in protein-based therapies. While the product Group I is linked to die process of Group II these do not ahaiea 
common special technical feature according to PCT Rule 13.2 as 'analogs, derivattves and variants' of groiq> I aie 
known in the ait (Horii ct at, Biochem. Biophys. Res. Commun., 28 November 1994). For the same reasons the 
product of Group I is also not technically linked to the process of Group m. 

Species A-C (Groups I and II) and A-D (Group II) do not relate to a single inventive conc^ under PCT Rule 13 . 1 
because, under PCT Rule 13.2 the commonly shared structure' docs not 'constitute a structurally distinctive portion in 
view of the prior art'. i.e. in view of Horii et al. 1994. Further the nonobvious differences in sequence structures 
between these genes render these genes structurally and functionally distinct. Accordingly, the claims aro not so linked 
by a spedal technical feature within the meaning of PCT Rule 13.2 so as to form a single inventive concept. 
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